# Data Understanding

## Settings & Importing Libraries

INFO about the notebook  ::
- df_NAME are the raw dataset imported initially
- df_join is the joined dataset
- df_temp is the temporary dataset for each kind of operation
- df_dropped is the dataset without outliers
- df_new is the final dataset for the Data Understanding task
- other df_X name will be evaluated

## Attributes global name & type:
### Geograph relative  -----
- continent  ::       object
- country  ::         object
- region  ::          object
### Vendor relative  -------
- vendor  ::          object
### Ram relative  ----------
- brand  ::           object
- ram_model  ::       object
- memory_type  ::     object
- id_ram  ::          object
- clock  ::           object
- memory_dim  ::      object
### Price relative  ---------
- currency  ::        object
- sales_usd  ::       float64
- sales_currency  ::  float64
### Time relative  ----------
- time_code  ::       datetime64[ns]
- week  ::            int64

In [1]:
#--0.0---------------------  SETTINGS -----------------------------------------
"""
Data Settings & Importing Libraries
"""

import pandas as pd
import numpy as np
import pylab as pl
import matplotlib as mpl
import matplotlib.pyplot as plt
import math
import seaborn as sns
from scipy import stats
import sys
import os
import collections
#!conda install --yes --prefix {sys.prefix} plotly
import plotly.io as pio
from pandas.plotting import scatter_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA


#%matplotlib inline 

pd.set_option('display.float_format', lambda x: '%.2f' % x)
#plt.rc('figure', figsize=(10, 8))
#plt.grid(True)
os.chdir('D:\\Dropbox\\Scuola\\Pisa\\Anno2\\Data Mining\\Esame\\Pratica\\python\\NewDataset')
dir = os.getcwd()


## Loading the data set

In [2]:
#   PRE-PROCESS
df_geo = pd.read_csv(dir + '\\Data\\geography.csv')
df_ram = pd.read_csv(dir + '\\Data\\ram.csv')
df_sales_ram = pd.read_csv(dir + '\\Data\\sales_ram.csv')
df_time = pd.read_csv(dir + '\\Data\\time.csv')
df_vendor = pd.read_csv(dir + '\\Data\\vendor.csv')

## Types of Attributes and basic checks

In [3]:
for set in ["geography", "ram", "sales_ram", "time", "vendor"]:
    df = pd.read_csv(dir + '\\Data\\' + set + '.csv')
    print("\n\nDataset ", set)
    print("\nNumber of rows::",df.shape[0])   
    print("\nNumber of columns::",df.shape[1] )   
    print("\nColumn Names::",df.columns.values.tolist())
    print()
    df.info()
    print()
    #print(df.dtypes)
    print(df.describe(include='all'))
    print()
    print(df.head())
    print("---------------------------------------------------------------")



Dataset  geography

Number of rows:: 75

Number of columns:: 6

Column Names:: ['Unnamed: 0', 'geo_code', 'continent', 'country', 'region', 'currency']

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 6 columns):
Unnamed: 0    75 non-null int64
geo_code      75 non-null int64
continent     75 non-null object
country       75 non-null object
region        75 non-null object
currency      75 non-null object
dtypes: int64(2), object(4)
memory usage: 3.6+ KB

        Unnamed: 0  geo_code continent  country                  region  \
count        75.00     75.00        75       75                      75   
unique         nan       nan         3       11                      75   
top            nan       nan    Europe  Germany  mecklenburg-vorpommern   
freq           nan       nan        49       16                       1   
mean         38.00     38.00       NaN      NaN                     NaN   
std          21.79     21.79       NaN      Na

## Type Conversion 

In [4]:
# conversion of some attributes before the join operation
df_sales_ram['ram_code'] = df_sales_ram['ram_code'].astype(int)
df_sales_ram['vendor_code'] = df_sales_ram['vendor_code'].astype(int)
df_sales_ram['geo_code'] = df_sales_ram['geo_code'].astype(int)
df_sales_ram['Id'] = df_sales_ram['Id'].astype(object)

### Preliminary operation for integration

In [5]:
# Some renaming
df_vendor = df_vendor.rename(columns={"name": "vendor"})
df_ram = df_ram.rename(columns={"name": "ram_model"})
df_ram = df_ram.rename(columns={"memory": "memory_dim"})
df_sales_ram = df_sales_ram.rename(columns={"Id": "id_ram"})
df_sales_ram = df_sales_ram.rename(columns={"sales_uds": "sales_usd"})

# Some dropping  
df_vendor = df_vendor.drop(columns=['Unnamed: 0'])
df_time = df_time.drop(columns=['Unnamed: 0'])
df_geo = df_geo.drop(columns=['Unnamed: 0'])
df_sales_ram = df_sales_ram.drop(columns=['Unnamed: 0'])

## Data Integration

In [6]:
# Join operations
df_join = df_sales_ram.join(df_ram.set_index('ram_code'), on='ram_code')
df_join = df_join.join(df_vendor.set_index('vendor_code'), on='vendor_code')
df_join = df_join.join(df_geo.set_index('geo_code'), on='geo_code')
df_join = df_join.join(df_time.set_index('time_code'), on='time_code')


# Post operations
df_join['time_code'] = pd.to_datetime(df_join['time_code'], format='%Y%m%d')
#-----Reduction of attributes, Some removal
df_join.drop(['geo_code', 'ram_code', 'year', 'day', 'vendor_code'], inplace=True, axis=1)
df_join['clock'] = df_join['clock'].astype(object)
df_join['memory_dim'] = df_join['memory_dim'].astype(object)

### Data completeness

In [7]:
df_temp = df_join[(df_join.time_code.dt.year == 2013) & (df_join.time_code.dt.month == 3)]
print(len(df_temp))
df_join.drop(df_temp.index, inplace=True)

df_temp = df_join[(df_join.time_code.dt.year == 2018) & (df_join.time_code.dt.month == 4)]
print(len(df_temp))
df_join.drop(df_temp.index, inplace=True)

4872
46848


## Data Statistics

In [8]:
df_join.info()
print("---------------------------------------------------------------")
print(df_join.describe(include='all'))
print("---------------------------------------------------------------")
print(df_join.head())
print("---------------------------------------------------------------")

#   Attributes split by type - RIMUOVERE week & sales_uds
num_float = ['sales_usd', 'sales_currency']
cat = ["clock", "memory_dim", 'id_ram', 'time_code', 'brand', 'ram_model', 'memory_type', 'vendor', 'continent', 'country', 'region', "currency"]

#   per vedere i valori distinti - USARE ATTRIBUTI CATEGORICI
for col in cat: 
    print("\nDistinct values in " + col + " : \t", df_join[col].unique())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3360611 entries, 8 to 3412330
Data columns (total 16 columns):
id_ram            object
time_code         datetime64[ns]
sales_usd         float64
sales_currency    float64
brand             object
ram_model         object
memory_dim        object
memory_type       object
clock             object
vendor            object
continent         object
country           object
region            object
currency          object
month             int64
week              int64
dtypes: datetime64[ns](1), float64(2), int64(2), object(11)
memory usage: 435.9+ MB
---------------------------------------------------------------
           id_ram            time_code   sales_usd  sales_currency    brand  \
count  3360611.00              3360611  3360611.00      3360611.00  3360611   
unique    3109.00                 1820         nan             nan       48   
top       5764.00  2016-04-11 00:00:00         nan             nan  G.SKILL   
freq      8840.

In [None]:
df_temp = df_join.sort_values( by="sales_usd", ascending=False)
print(df_temp.describe())
print()

fig, ax = plt.subplots()
#ax.set_title('\nOutliers of ' +col+ ' in the Dataset')
ax.boxplot(df_temp["sales_usd"])

print("---------------------------------------------------------------")


df_temp1 = df_temp[df_temp["sales_usd"] > 5000]
print(df_temp1.head(50))
print()

print(df_temp[(df_temp["id_ram"] == 3753) & (df_temp["brand"] == "ADATA") & (df_temp["memory_dim"] == 4.00) & (df_temp["memory_type"] == "DDR3") & (df_temp["clock"] == 1600) & (df_temp["time_code"].dt.year > 2015) ])
print()

df_temp = df_join[df_join["week"] == 53] #oltre il 50%
print(df_temp)
print()

df_temp = df_join[["week", "time_code"]]
df_temp = df_temp[df_temp["week"] == 52]
print(df_temp)
print()

# sembra essere normale la ripartizione delle settimane, il possibile intoppo credo possa essere
    # tra la 1a e la 2a settimana
    # nella repository d'esempio noto da 1 a 52
    # non centrano anni bisestili

In [None]:
df_join[["time_code", "week"]][(df_join["time_code"].dt.week == 1) & (df_join["week"] != 1)]

In [None]:
df_join[["time_code", "week"]][df_join["time_code"].dt.week == 53]

COSA POSSO TRARRE:
- devo rilavorare sales_currency e usd per ottenere una migliore distribuzione
- settimana ha il valore 53: ci sono 47030 valori con settimana 53
    - voglio relazionare timecode e week
        - HO NOTATO CHE week 

## Duplicates/Missing

In [None]:
# missing values
print("\nColumns with Missing Values::",df_join.columns[df_join.isnull().any()].tolist())
print(df_join.isnull().sum())   #0

# duplicates
print("\nNumber of duplicates is::")
print(df_join.duplicated().sum())   #0

## Relationships

#### - ID_RAM and TIME_CODE Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "id_ram"
y = "time_code"

df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()

df_temp = df_join.groupby(y)[x].nunique()
print()
print(df_temp)
print()

"""df_temp = df_join.groupby(y)[x].nunique().reset_index(name="new_" + x)
#df_temp = df_temp["new_" + x].value_counts()
#print(df_temp[df_temp.groupby(y)[y] == 1])
print()
print(df_temp)
print()"""


print("\n####-------------------------x e y normali -------------------------------------")

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
#print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
#print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
#print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
#print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 


print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un unico time_code (giorno) associato con 6 id_ram
- c'è un unico id_ram associato a 1840 giorni diversi
- il giorno che ho venduto meno schede diverse
- il giorno che ho venduto più schede diverse
- il giorno che ho venduto meno schede non_uniche (171) (2017-08-22)
- il giorno che ho venduto più schede non_uniche (8052) (2018-04-11)

#### - ID_RAM and SALES_USD Attributes		- V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
##------  continent: quale altro attributo è presente di più
x = "id_ram"
y = "sales_usd"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()

df_temp = df_join.groupby(y)[x].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x )
print(df_temp)
print()

df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()


COSA POSSO TRARRE:
- ci sono 61 id_ram con un unico prezzo (OK)
- c'è un id_ram con 8581 prezzi diversi (OK)
- c'è un prezzo associato a 271 id_ram diversi (il prezzo più presente tra tutti) (99.99)
- il prezzo più presente nel dataset potrebbe essere 89.99USD (CHECK ultimo print)
- ci sono 2759825 prezzi diversi associati ad un unica ram (la ram con più variazioni di prezzi)(stessa valuta) (OK)
- 

#### - ID_RAM and SALES_CURRENCY Attributes	- NV,

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "id_ram"
y = "sales_currency"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


"""df_temp = df_join.groupby(y)[x].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x )
print(df_temp)
print()"""


df_temp = df_join.groupby(x)[y].nunique()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
con groupby
- ci sono 142 id_ram con associati un prezzo unico (OK)
- c'è un id_ram con 6244 prezzi deversi (OK) (id_ram == 4377, Corsair Vengance 16Gb DDR4)

- 259545 prezzi unici (in totale) 
- 231088 prezzi totali
- il prezzo (totalmente) più presente (438 volte) è 79.99 (con valuta) 
- il 18.79 (non per forza usd) è presente 3156 volte su id_ram non uniche

#### - ID_RAM and BRAND Attributes		- U&V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "id_ram"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(y)[x].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x )
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "id_ram"

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- verificato che ogni id_ram può avere un solo brand associato

----------------------------------------------------------------
- 8 brand vendono un unica versione di un unico prodotto (OK)
- c'è un brand (G.SKILL) che vende 517 prodotti unici, ma totali 833668 (max) (OK)
- il brand che ha venduto meno è DANE-ELEC

#### - ID_RAM and RAM_MODEL Attributes		- U&V,

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "id_ram"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(y)[x].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].nunique()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "id_ram"

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- verificato che ogni id_ram può avere un solo nome associato
    discorso valido anche per le dimensioni di memoria, le tipologie di memoria, freq. di clock
        uniche per ogni id_ram


----------------------------------------------------------------
- ci sono 51 modelli di ram (nomi diversi) venduti in unica versione (freq.,mem.,tech.,ecc.) (OK)
- c'è un modello (Corsair Vengance) presente con 113 versioni differenti (freq.,mem.,tech.,ecc.) non uniche (OK)
- il modello che è stato venduto un unica volta è il Kingston Lv Xmp 10Th Anniversary
    perchè sembra essere un edizione speciale
- il modello con più vendite è il Corsair Vengeance  (266917)

#### - ID_RAM and VENDOR Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "id_ram"
y = "vendor"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 


print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 1306 id_ram diversi con associati un unico venditore (OK)
    - il venditore che ha acquistato più ram diverse, 
- c'è un unico id_ram con associati 64 venditori (4377) (OK)
- il venditore con meno vendite (7) sembra essere Monoprice, che vende solo 4 modelli non uguali
- il venditore con più id_ram diverse vendute (1733) semrba essere geizhals_unknown, con totale (max)
    vendite a 2043095

#### - ID_RAM and CONTINENT Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "id_ram"
y = "continent"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique()#.value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
x = "continent"
y = "id_ram"

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(y1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()


In [None]:
df_temp = df_join[df_join["continent"] == "Europe"]
df_temp = df_temp.groupby("id_ram")["continent"].nunique()
print(df_temp)

COSA POSSO TRARRE:
- ci sono 1036 id_ram diversi associati ad un unico continente 
- ci sono 1122 id_ram diversi associati a tutti e 3 i continenti
- L'Europa è il continente con il maggior numero (825) di modelli di ram venduti esclusivamente 
- Il continente con meno id_ram diverse importate (1548) è l'America
- Il continente con più id_ram diverse importate (2904) è l'Europa
- Oceania dovrebbe essere il continente con meno ram vendute/importate
- L'Europa è il continente con più ram vendute/importate 

#### - ID_RAM and COUNTRY Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "id_ram"
y = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "id_ram"
x = "country"

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(y1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 280 id_ram diversi con associati un unica nazione (OK)
- ci sono 128 id_ram diversi associati a 11 nazioni (OK)
- ci sono però 1176 id_ram diversi che hanno associate solo 2 nazioni
- l'Italia è la nazione con meno id_ram diverse vendute (238)
- la Germania è la nazione con il maggior numero di ram diverse vendute (2324)
- gli USA hanno acquistato (con esclusiva) 147 ram diverse

#### - ID_RAM and REGION Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "id_ram"
y = "region"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "id_ram"
x = "region"

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(y1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 18 ram diverse acquistate in esclusiva da un unica regione (OK)
- c'è una ram acquistata da 75 regioni (OK) (id_ram 5756)
- la regione che ha acquistato meno ram diverse è la north island-central east (Oceania)
- la regione con il maggior numero di ram diverse vendute è la saarland (Europa)
- la regione con meno ram acquistate (non uniche) (126) è south italy (Europe)
- la regione con più ram acquistate (non uniche) (144085) è saxony-anhalt (??)

#### - TIME_CODE and BRAND Attributes		- ?&V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "time_code"
y = "brand"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "time_code"
x = "brand"

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un unico giorno dove hanno acquistato 9 brand (OK) (2017-10-13)
- c'è un unico giorno dove hanno acquistato 41 brand (OK) (2018-04-08)
- i brand che hanno acquistato meno giorni (2) (unici) sono DANE-ELEC e GALAX/KFA2
- i brand che hanno acquistato in più giorni (unici) sono (7)
- il giorno che si è acqusitato meno ram (non uniche) (172) è 2017-08-22
- il giorno che si è acqusitato meno ram (non uniche) (8052) è 2018-04-11

#### - WEEK and COUNTRY Attributes		- V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
y = "week"
x = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()

df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un unica nazione (Italia) con 26 settimane associate (OK)
- 9 nazioni che vendono tutto l'anno (OK)
- le settimane dove si vendono meno ram (uniche) (9) sono la 16-17-18-22
- le settimane dove si vendono più ram (uniche) (11) sono 26 totali (1..15)(41...53)
- la settimana con meno acquisti totali è la 1 (35641)
- la settimana con più acquisti totali è la 12 (71950)

#### - SALES_USD and BRAND Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "sales_usd"
y = "brand"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(y)[x].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "sales_usd"
x = "brand"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- 2818449 prezzi (USD) hanno un unico brand associato (OK)
    - può voler dire che è il brand con più variazione nei prezzi o cmq con più cambi di prezzi
- ci sono 17 brand con soli 3 prezzi diversi per i propri modelli (OK)
- il brand con meno prezzi associati non-unici (2) è DANE-ELEC
- il brand con più prezzi associati non-unici (723593) è G.SKILL
- il prezzo maggiormente adottato è 89.99

#### - SALES_USD and VENDOR Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "sales_usd"
y = "vendor"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(y)[x].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "sales_usd"
x = "vendor"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un vendor con 2878768 prezzi diversi per le proprie ram (OK) (pricespy_unknown ???)
    - può voler dire che è il vendor con più variazione nei prezzi o cmq con più cambi di prezzi
- ci sono 13 vendor con soli 5 prezzi diversi per i propri modelli (OK)
- il vendor con meno prezzi (4) non unici è Monoprice
- il vendor con più prezzi (1824646) non unici è geizhals_unknown

#### - CURRENCY and MEMORY_TYPE Attributes	- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "currency"
y = "memory_type"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "currency"
x = "memory_type"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 3 valute (NZD-AUD-GBP) uniche con associate 3 tipologie di ram (OK)
- c'è una valuta (EUR) con associati 7 tecnologie ram (OK)
- le tecnologie con più valute associate sono DDR2, DDR3, DDR4
- la valuta con meno prodotti venduti (56648) è NZD
- la valuta con più prodotti venduti (2374526) è EUR

#### - CURRENCY and VENDOR Attributes		- V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
y = "currency"
x = "vendor"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")

df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 25 venditori che usano una sola valuta (OK)
- c'è un venditore (Mighty Ape) che vende in 4 valute (OK)
- la valuta con meno venditori (non-unici) (8) è CAD
- la valuta con meno venditori (non-unici) (37) è EUR

#### - CURRENCY and CONTINENT Attributes	- ?&NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "currency"
y = "continent"

df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "currency"
x = "continent"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 2 valute per continente

#### - CURRENCY and COUNTRY Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "currency"
y = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "currency"
x = "country"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- L'EURO con cui è possibile comprare nel maggior numero di nazioni (6)
- le altre valute possono acquistare in un'unica nazione
- ci sono 11 nazioni 

#### - CURRENCY and REGION Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "currency"
y = "region"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(y)[x].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "currency"
x = "region"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- L'euro è utilizzato da 38 regioni, la valuta più utilizzata
- ci sono 2 valute (CAD-AUD) utilizzate da 6 regioni

#### - BRAND and WEEK Attributes		- NV, 

FATTO -> verifica


In [None]:
## SPECIFIC CASE
x = "brand"
y = "week"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "week"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un brand (PAREEMA) che vende solo una settimana dell'anno (16) (potrei rimuovere in quanto outlier)
    - ci sono 3 brand che vendono solo 2 settimane dell'anno (potrei rimuovere in quanto outlier)
- 35(37)(38) brand vendono tutto l'anno
- le settimane in cui hanno acquistato meno brand (39) sono 21-53
- le settimane in cui hanno acquistato più brand (45) sono 11-14

#### - BRAND and CURRENCY Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "currency"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "currency"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 14 brand che vendono con una sola valuta (OK)
- ci sono 12 brand che vendono in tutte le valute del dataset

#### - BRAND and RAM_MODEL Attributes		- ?&V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "ram_model"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "ram_model"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 27 brand che hanno un unico modello di ram
- c'è un brand (MUSHKIN) con 39 prodotti
- ci sono 279 brand che hanno un solo ram_model
- c'è un modello (Galax Hof) che ha 2 brand associati (OUTLIERS)

#### - BRAND and MEMORY_DIM Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "memory_dim"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "memory_dim"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 9 brand che vendono memorie di un unica dimensione (OK)
- c'è 1 brand che vende memorie con 17 diverse dimensioni (CRUCIAL)
- le dimensioni di memoria con un unico brand sono 0.12-96-256
- le dimensioni di memoria con più brand (38) è 8GB
- 

#### - BRAND and MEMORY_TYPE Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "memory_type"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "memory_type"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 13 brand che vendono memorie di un unica tipologia (OK)
- ci sono 3 brand (G.SKILL-KINGSTON-MUSHKIN) che vendono memorie di tutte le tipologie (OK)
- la tipologia di memoria con meno brand (1) è DDR3U (esclusiva)
- la tipologia di memoria con più brand (40) è DDR3 (esclusiva)

#### - BRAND and CLOCK Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "clock"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "clock"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 10 brand che vendono memorie con un unica frequenza di clock (OK)
- c'è un unico brand (CORSAIR) che vende memorie con 37 frequenze di clock (OK)
- ci sono 7 frequenze di clock vendute da un unico brand
- la frequenza venduta da più brand (36) è la 1600
- 

#### - BRAND and VENDOR Attributes		- V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "vendor"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "vendor"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un brand (CORSAIR) che vende a 72 venditori (OK)
- ci sono 16 brand che vendono ad un unico venditore (OK)
- ci sono 4 venditori che vendono ad un unico brand 
- c'è 1 venditore (geizhals_unknown) che vende a più brand (30)

#### - BRAND and CONTINENT Attributes		- NV, 

FATTO -> verifica


In [None]:
## SPECIFIC CASE
x = "brand"
y = "continent"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "continent"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 16 brand che vendono in un unico continente (OK)
- ci sono 23 brand che vendono in tutti i continenti (OK)
- il continente che vende a meno brand (28) è l'America
- il continente che vende a più brand (46) è l'Europa

#### - BRAND and COUNTRY Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "country"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 4 brand che vendono in un unica nazione (OK)
- ci sono 6 brand che vendono in tutte (11) le nazioni (OK)
- le nazioni che hanno meno brand (9) sono France e Italy
- la nazione che ha più brand (38) sono Germany
- 

#### - BRAND and REGION Attributes		- V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "brand"
y = "region"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "brand"
x = "region"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un brand (DAN-ELEC) che vende in una sola regione (west-midlands)
- ci sono 3 brand che vendono in 75 regioni (OK)
- i brand che vendono in meno regioni (4) sono 3 
- i brand che vendono in più regioni (34) sono 4

#### - RAM_MODEL and TIME_CODE Attributes	- NV, 

FATTO -> verifica


In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "time_code"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp.head(30))
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()

df_temp = df_join.groupby(x)[y].nunique().reset_index(name="new_" + y)
print(df_temp[df_temp.new_time_code == 1])
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "time_code"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono alcuni modelli di ram con pochissimi time_code associati, 
    in particolare c'è una ram (Kingston Lv Xmp 10Th Anniversary) con associato un solo time_code,
        (non necessariamente a motivare un singolo acquisto effettuato)
- c'è un modello di ram con associati 96 timecode
- il giorno in cui sono venduti meno modelli di ram (33) è il 2017-12-10
- il giorno in cui sono venduti più modelli di ram (232) è il 2018-04-08
- ci sono 6 ram model che hanno venduto di più (1840) 

#### - RAM_MODEL and WEEK Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "week"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "week"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 3 ram_model associati ad una sola settimana (16) (OK)
- gran parte dei ram_model (186) sono a copertura di tutte le settimane (OK)
- la settimana con meno ram_model venduti (215) è la 29
- la settimana con meno ram_model venduti (249) è la 14


#### - RAM_MODEL and SALES_USD Attributes	- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "sales_usd"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "sales_usd"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 2 ram_model con solo un prezzo associato (OK)
- c'è un ram_model (Corsair Vengeance) con 243577 prezzi USD associati

#### - RAM_MODEL and SALES_CURRENCY Attributes	- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "sales_currency"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "sales_currency"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 11 ram_model con solo un prezzo associato (OK)
- c'è un ram_model (Corsaire Vengeance) con 116017 prezzi (diverse valute) associati (OK)
- ci sono 263639 prezzi associati ad un ram model
- il prezzo con più ram_model presenti è 49.99

#### - RAM_MODEL and MEMORY_TYPE Attributes	- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "memory_type"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "memory_type"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 160 ram_model con un unica tecnologia associata
- ci sono 2 ram_model (Kingston Valueram, Mushkin Essentials) con associati tutte le tipologie di ram

#### - RAM_MODEL and MEMORY_DIM Attributes	- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "memory_dim"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "memory_dim"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 59 ram_model con associati solo una dimensione di ram (OK)
- c'è un ram_model (Crucial) con associate 17 dimensioni per la ram 
- la dimensione di memoria con associato un unico ram_model (Kingston Valueram) è 0.12
- la dimensione di memoria con associato 203 ram_model è 8GB

#### - RAM_MODEL and CLOCK Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "clock"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "clock"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 80 ram_model con associati una sola frequenza di clock per ognuno (OK)
- c'è un ram_model (Corsair Vengeance) che ha 26 frequenze di clock (OK)
- ci sono 6 frequenze di clock con associati un unico ram_model
- la frequenza (1600) è quella con più ram_model associati (164)
- 

#### - RAM_MODEL and VENDOR Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "vendor"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "vendor"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 115 ram_model con associate ognuna un unico venditore (OK)
- c'è un ram_model (Corsair Vengeance) con associato 70 venditori (OK)
- ci sono 4 venditori con il minor numero di ram_model associati (3)
- c'è un venditore (geizhals_unknown) con il maggior numero di ram_model associati (162)

#### - RAM_MODEL and CONTINENT Attributes	- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "continent"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "continent"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 87 ram_model collegati ad un unico continente (di cui 80 solo in Europa) (OK)
- ci sono 112 ram_model che vendono in tutti i continenti (OK)
- l'America è il continente con meno ram_model associati (141)
- l'Europa è il continente con più ram_model associati (273)
- 

#### - RAM_MODEL and COUNTRY Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "country"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 14 ram_model venduti in un unica nazione (10 di cui in UK) (OK)
- ci sono 29 ram_model venduti in 11 nazioni (OK)
- l'italia è la nazione con associati meno ram_model (44)
- la Germania è la nazione con associati più ram_model (217)


#### - RAM_MODEL and REGION Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "ram_model"
y = "region"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "ram_model"
x = "region"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 2 ram_model venduti in un unica regione (OK)
- ci sono 5 ram_model venduti in 75 regioni (OK)
- la regione con meno ram_model venduti (23) è north italy
- le regioni con più ram_model venduti (188) sono 2

#### - MEMORY_DIM and CURRENCY Attributes	- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "memory_dim"
y = "currency"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()




print("\n####-------------------------x e y normali -------------------------------------")
y = "memory_dim"
x = "currency"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 2 dimensioni di memoria con associata una sola valuta (EURO, 128-250MB) (OK)
- ci sono 13 dimensioni di memoria con associate tutte le valute (OK)
- le valute con associate meno dimensioni di memoria (13) sono CAD e NZD
- la valuta con associata meno dimensioni di memoria (18) è l'EUR
- la dimensione di memoria venduta meno (870) è 256GB
- la dimensione di memoria venduta meno (829243) è 8GB

#### - CLOCK and MEMORY_TYPE Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "clock"
y = "memory_type"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(y)[x].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x)
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "clock"
x = "memory_type"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 29 clock con associati una sola tipologia di memoria (OK)
- c'è un solo clock (1600) con associato 4 tipi di memoria
- la tipologia di ram con associati meno frequenze di clock (1) è DDR3U
- la tipologia di ram con associati più frequenze di clock (25) è DDR4
- la frequenza di clock con meno vendite (28) è la 3666
- la frequenza di clock con più vendite (699161) è la 1600

#### - CLOCK and COUNTRY Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "clock"
y = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "clock"
x = "country"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è una frequenza di clock (3666) associata ad una sola nazione (America) (OK)
- ci sono 16 frequenze di clock vendute in 11 nazioni (OK)
- l'Italia è la nazione con meno frequenze di clock associate (18)
- le nazioni con più freq. di clock associate (41) sono Germania e Spagna

#### - CLOCK and REGION Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "clock"
y = "region"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(y)[x].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x)
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "clock"
x = "region"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è una frequenza (3666) di clock associata a 7 regioni (OK)
- ci sono 10 frequenze di clock presenti in 75 regioni (OK)
- le regioni con associate meno freq. di clock (12) sono 2
- le regioni con associate più freq. di clock (41) sono 15

#### - VENDOR and TIME_CODE Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "vendor"
y = "time_code"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(y)[x].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x)
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "vendor"
x = "time_code"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un venditore con associati solo 2 time code
- c'è un unico venditore che ha 1831 time_code
- ci sono 723 time code con un solo venditore associato
- ci sono 2 time code con il maggior numero di venditori associato (74)
- Monoprice è il venditore con associati meno time_code (2)
- geizhals_unknown è il venditore con associati più time_code (1831)

#### - VENDOR and WEEK Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "vendor"
y = "week"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "vendor"
x = "week"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un venditore (Monoprice) che ha venduto in 2 sole settimane (OK)
- ci sono 40 venditori che vendono in tutto l'anno (OK)
- le settimane con meno venditori (49) sono la 17 e la 18
- la settimana con più venditori (76) è la 15

#### - VENDOR and MEMORY_TYPE Attributes	- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "vendor"
y = "memory_type"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "vendor"
x = "memory_type"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- c'è un venditore (YoYoTech) che vende una sola tipologia di ram (DDR4) (OK)
- c'è un venditore (geizhals_unknown) che vende tutte le tipologie di ram (7) 
- ci sono 3 tipologie di ram con un unico venditore associato
- ci sono 2 tipologie di ram con più venditori associati (77)

#### - VENDOR and CONTINENT Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "vendor"
y = "continent"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "vendor"
x = "continent"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 53 venditori che vendono in 2 contenti, mentre 25 in un unico continente (di cui 14-11 America,Europa) (OK)
- l'America è il continente con meno venditori (22)
- l'Europa è il continente con più venditori (64)

#### - VENDOR and COUNTRY Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "vendor"
y = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "vendor"
x = "country"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 14 venditori che vendono in un unica nazione (USA) (OK)
- ci sono 3 venditori che vendono in 4 nazioni (OK)
- l'Italia è la nazione con meno venditori (3)
- l'Australia è la nazione con più venditori (35)

#### - CONTINENT and COUNTRY Attributes		- V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "continent"
y = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "continent"
x = "country"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- sono presenti 2 continenti con 2 nazioni (America, Oceania)
- il continente con più regioni (7) è l'Europa

#### - CONTINENT and REGION Attributes		- ?&V, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "continent"
y = "region"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "continent"
x = "region"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- America e Oceania presentano 13 regioni ognuna
- l'Europa ne presenta 49

#### - REGION and WEEK Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "region"
y = "week"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()


print("\n####-------------------------x e y normali -------------------------------------")
y = "region"
x = "week"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 59 regioni dove si vende tutto l'anno (OK)
- ci sono 3 regioni che hanno venduto in sole 5 settimane dell'anno (OK)
- le settimane con meno regioni sono la 17-18
- le settimane con più regioni sono la 11..14

#### - REGION and CURRENCY Attributes		- ?&NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "region"
y = "currency"
df_temp = df_join.groupby(y)[x].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + x + " , WHILE THE 2ND is " + y)
print("We want to verify if some value of " + y + " is associated with more than one value of " + x)
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "region"
x = "currency"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 38 regioni che usano l'EURO

#### - REGION and VENDOR Attributes		- NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "region"
y = "vendor"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "region"
x = "vendor"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- ci sono 6 regioni con associate un unico venditore (OK)
- c'è un unica regione con 35 venditori (queensland)
- ci sono 2 venditori con 3 regioni
- il venditore che vende in più regioni (28) è Alternate

#### - REGION and COUNTRY Attributes		- ?&NV, 

FATTO -> verifica

In [None]:
## SPECIFIC CASE
x = "region"
y = "country"
df_temp = df_join.groupby(x)[y].nunique().value_counts().sort_index()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].unique().value_counts()
print("\n-----------REMEMBER THAT THE FIRST ATTRIBUTE SHOWED IS " + y + " , WHILE THE 2ND is " + x)
print("We want to verify if some value of " + x + " is associated with more than one value of " + y )
print(df_temp)
print()


df_temp = df_join.groupby(x)[y].value_counts()
print()
print(df_temp)
print()



print("\n####-------------------------x e y normali -------------------------------------")
y = "region"
x = "country"


df_temp=df_join.groupby(x)[y].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x]):
    if index < 3:
        print(df_join[df_join[x] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")   



print("\n####-------------------------x e y invertiti -------------------------------------")
y1 = x
x1 = y

df_temp=df_join.groupby(x1)[y1].nunique().reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["count"] == df_temp["count"].min()]
print(df_temp1)
print()
for index, z in enumerate(df_temp1[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-")    


print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["count"] == df_temp["count"].max()]
print(df_temp2)
print()
for index, z in enumerate(df_temp2[x1]):
    if index < 3:
        print(df_join[df_join[x1] == z].head(3))
        print("######################################")
print(".-.-.-.-.-.-.-.-.-.-.-.-.-.-.-..-.-.-.-.-.-.-.-") 



print("\n####-------------------------counts su x e y invertiti -------------------------------------")

df_temp=df_join.groupby(x1).count()#.reset_index(name="count")
#.sort_index()
#print(df_temp)
print()

print("MIN Considerations::\n\n")


#MIN:

df_temp1 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].min()]
print(df_temp1)
print()

print("MAX Considerations::\n")
print()


#MAX:    

df_temp2 = df_temp[df_temp["id_ram"] == df_temp["id_ram"].max()]
print(df_temp2)
print()

COSA POSSO TRARRE:
- Germania e UK hanno il maggior numero di regioni (16-11)
- Belgio e Italia sono le nazioni con meno regioni 

#### - ID_RAM Attribute

In [None]:
##------  ID → Quanto ogni singolo modello (specifico) è stato venduto
df_temp = df_join.groupby('id_ram')['id_ram'].count()
print("\nThe model that sold the most is :: \t", df_temp.idxmax()) 
print("The model that sold the least is :: \t", df_temp.idxmin())



df_temp = df_join.groupby('id_ram').nunique()#.reset_index(name="count")
print(df_temp)
print()


COSA POSSO TRARRE:
- mi sono accorto che l'attributo id_ram non identifica la vendita o il carrello, ma l'insieme delle componenti identificative per la ram
    è un codice univoco per ogni combinazione di caratteristiche della RAM

## Data Distributions

In [None]:
#--1.8---------------------  Distributions  -----------------------------------

# --------------   histograms  ---------------
mpl.rc('figure', max_open_warning = 0) #perchè 21 attributi
sturge_number = math.trunc(np.log2(len(df_join)))
for col in df_join:  
    print("\n\nCONSIDERATIONS ABOUT THE ATTRIBUTE: " + col)
    df_join[col].hist(bins = sturge_number + 1)  #Sturges' rule
    pl.suptitle(col)    
    plt.savefig(dir+'\\Histogram\\'+col+'-hist.jpg')
    plt.xticks(rotation=45)
    plt.figure(figsize = (10,8))
    plt.show()
    print()

    print('\nThe number of unique values for the attribute ' + col + ' is:')
    print(df_join[col].nunique())
    print()

#   per vedere il conteggio dei valori 
    print('\nThe values count for the attribute ' + col + ' is:')
    print(df_join[col].value_counts())
    print()
    df_join[col].value_counts().to_csv('D:\\Dropbox\\Scuola\\Pisa\\Anno2\\Data Mining\\Esame\\Pratica\\python\\NewDataset\\Value Count\\'+col+'-value-count.csv')


CONSIDERAZIONI:
- c'è da controllare la distribuzione su sales_usd, sales_currency, memory_dim

## Outliers
- highest and lowest sales_usd
    - decido di lavorare solo sales_usd e poi rimuoverò sales_currency

Outliers da Relationship

In [None]:
"""#-------------- il brand che ha venduto meno è DANE-ELEC
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- i brand che hanno venduto meno giorni (2) (unici) sono DANE-ELEC e GALAX/KFA2
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- il modello che è stato venduto un unica volta è il Kingston Lv Xmp 10Th Anniversary
    #-------------perchè sembra essere un edizione speciale
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- il venditore con meno vendite (7) sembra essere Monoprice, che vende solo 4 modelli non uguali
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- la regione con meno ram acquistate (non uniche) (126) è south italy (Europe) (FORSE!!)
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- c'è un brand (PAREEMA) che vende solo una settimana dell'anno (16) (potrei rimuovere in quanto outlier)
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- ci sono 3 brand che vendono solo 2 settimane dell'anno (potrei rimuovere in quanto outlier)
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- c'è un modello (Galax Hof) che ha 2 brand associati (OUTLIERS)
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- i brand che vendono in meno regioni (4) sono 3 (CONTROLLARE LE QUANTITA' DI VENDITE)
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- ci sono 2 ram_model venduti in un unica regione (OK)
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- la dimensione di memoria con associato un unico ram_model (Kingston Valueram) è 0.12 (CONTROLLO)
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")

#-------------- ci sono 4 venditori con il minor numero di ram_model associati (3)
outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
df_join.drop(outliers.index, inplace=True, errors="ignore")


id_ram con un solo time_code associato"""

In [22]:
test=df_join.groupby("id_ram").count().sort_values(by="time_code", ascending=True)
test.head(50)

import statistics
list2 = test["time_code"].tolist()
print(statistics.mean(list2))

print(test.describe())



1080.9298809906722
       time_code  sales_usd  sales_currency   brand  ram_model  memory_dim  \
count    3109.00    3109.00         3109.00 3109.00    3109.00     3109.00   
mean     1080.93    1080.93         1080.93 1080.93    1080.93     1080.93   
std      1240.95    1240.95         1240.95 1240.95    1240.95     1240.95   
min         1.00       1.00            1.00    1.00       1.00        1.00   
25%       107.00     107.00          107.00  107.00     107.00      107.00   
50%       619.00     619.00          619.00  619.00     619.00      619.00   
75%      1813.00    1813.00         1813.00 1813.00    1813.00     1813.00   
max      8840.00    8840.00         8840.00 8840.00    8840.00     8840.00   

       memory_type   clock  vendor  continent  country  region  currency  \
count      3109.00 3109.00 3109.00    3109.00  3109.00 3109.00   3109.00   
mean       1080.93 1080.93 1080.93    1080.93  1080.93 1080.93   1080.93   
std        1240.95 1240.95 1240.95    1240.95  124

In [None]:
#--1.9---------------------  Box Plots & Outliers  ----------------------------
    
for col in num_float:  
    fig, ax = plt.subplots()
    ax.set_title('\nOutliers of ' +col+ ' in the Dataset')
    ax.boxplot(df_join[col])
    plt.show()

    print("---------------------------------------------------------------")
    
    
attr_outliers=["sales_usd"]
"""for col in attr_outliers:
    df_join['zscore']=(df_join[col]-df_join[col].mean())/df_join[col].std()
    df_join[col] = np.where(df_join['zscore']<-3, df_join[col].mean(), df_join[col])
    df_join[col] = np.where(df_join['zscore']>3, df_join[col].mean(), df_join[col])

#   "PROVA DEL 9" - memory_dim NON restituisce 0
for col in attr_outliers:
    df_join['zscore']=(df_join[col]-df_join[col].mean())/df_join[col].std()
    print(df_join[(df_join['zscore']<-3) | (df_join['zscore']>3)].shape[0])    #deve restituire tutti 0

del df_join['zscore']"""

In [None]:
def iqr_values(s):
    q1 = s.quantile(q = 0.15)

    q3 = s.quantile(q = 0.75)    

    iqr = q3 - q1

    #iqr_left = q1 - 2*iqr
    
    iqr_right = q3 + 2*iqr
    
    return q1, iqr_right

In [None]:
df_join["sales_usd"].plot.kde()

In [None]:
df_join.sales_usd.skew()

In [None]:
df_join.sales_usd.kurt()

In [None]:
df_join.corr()

In [None]:
x="sales_usd"

print("OUTLIERS REPRESENTATION FOR ATTRIBUTE :\t" + x)
left_sale, right_sale = iqr_values(df_join[x])
df_join[(df_join[x] > left_sale) & (df_join[x] < right_sale)][x].plot.box()
plt.show()
print()
outliers = df_join[(df_join[x] < left_sale) | (df_join[x] > right_sale)]
print("\n Outliers founded")
print()
print(outliers.describe())
print()
print("--------------------------------------------------")
print("\n Dataset dropped")
df_dropped = df_join[(df_join[x] > left_sale) & (df_join[x] < right_sale)]
print(df_dropped.describe())
print()

In [None]:
df_dropped["sales_usd"].plot.kde()

In [None]:
df_dropped["sales_usd"].skew()

In [None]:
df_dropped["sales_usd"].kurt()

## Correlation

In [None]:
df_dropped.drop(labels='week', inplace=True, axis=1)

In [None]:
#--1.6---------------------  Correlation  -------------------------------------

print("\nCorrelation Matrix with Pearson::")
corr = df_dropped.corr(method='pearson')
sns.set(font_scale=0.8)
plt.figure(figsize = (7,7))
ax = sns.heatmap(corr, vmin=-0.1, vmax=1, linewidths=1)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right')   #Look the method
#plt.rc('figure', figsize=(18, 7))
plt.show()

print("\nCorrelation Matrix with Spearman::")
corr = df_dropped.corr(method='spearman')
sns.set(font_scale=0.8)
plt.figure(figsize = (7,7))
ax = sns.heatmap(corr, vmin=-0.1, vmax=1, linewidths=1)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right')   #Look the method
#plt.rc('figure', figsize=(18, 7))
plt.show()

print("\nHeatmap correlation::")
sns.heatmap(df_dropped.corr(), annot=True);

#### Correlation by scatter plots

In [None]:
plt.figure(figsize = (15,15))
plt.scatter(df_dropped["sales_usd"], df_dropped["sales_currency"])
plt.xlabel("sales_usd")
plt.ylabel("sales_currency")
plt.title('\nCorrelation between sales_usd and sales_currency')
plt.show()

## Data Visualization
-1.7---------------------  Visualization  ------------------------------
"""histograms are used to show distributions of variables 
while bar charts are used to compare variables. 
Histograms plot binned quantitative data 
while bar charts plot categorical data"""

### Bar charts (WORK)

In [None]:
#----------bar charts / Categorical attributes and comparing variables
for x in df_dropped.columns:
    plt.figure(figsize = (20,10))
    if x != "time_code":
        df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.time_code.dt.month])[x].nunique().plot(kind="bar", title="test")
        plt.xticks(rotation=90, horizontalalignment="center")
        plt.title(x)
        plt.xlabel("Group by Years and Months")
        plt.ylabel("nunique()")
        plt.show()

In [None]:
df_dropped.groupby(df_dropped.time_code.dt.year)['id_ram'].count().plot(kind="bar", title="test")
plt.xticks(rotation=30, horizontalalignment="center")
plt.title("Number of purchases of ram per year")
plt.xlabel("Years")
plt.ylabel("Rams bought")

In [None]:
df_dropped['time_code'].min()

In [None]:
df_dropped['time_code'].max()

Nel 2013 la partenza dei dati è a Marzo e copre 8 mesi mentre nel 2018 i dati finiscono ad aprile e la colonna copre solo i primi 4 mesi circa


In [None]:
df_dropped.groupby('continent')['id_ram'].count().plot(kind="bar", title="test")
plt.style.use('ggplot')
plt.xticks(rotation=30, horizontalalignment="center")
plt.title("Number of purchases of ram per continent")
plt.xlabel("Continent")
plt.ylabel("Rams bought")

In [None]:
tmp=df_dropped.groupby('vendor')['id_ram'].count().sort_values(ascending=False)
tmp=tmp.head(10)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Top 10 Most selling vendors")
plt.ylabel("Vendors")
plt.xlabel("Rams sold")

In [None]:
tmp=df_dropped.groupby('brand')['time_code'].count().sort_values(ascending=False)
tmp=tmp.head(10)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Top 10 Most selling Brands")
plt.ylabel("Brand")
plt.xlabel("Rams sold")
plt.show()

In [None]:
tmp=df_dropped.groupby('brand')['time_code'].count().sort_values(ascending=False)
tmp=tmp.tail(10)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("10 less selling Brands")
plt.ylabel("Brand")
plt.xlabel("Rams sold")

In [None]:
tmp=df_dropped.groupby('brand')['id_ram'].nunique().sort_values(ascending=False)
tmp=tmp.head(10)
tmp.plot.barh(stacked=True)
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Top 10 Most selling Brands")
plt.ylabel("Vendors")
plt.xlabel("Rams sold")
plt.show()

In [None]:
tmp=df_dropped.groupby('ram_model')['time_code'].count().sort_values(ascending=False)
tmp=tmp.head(10)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Top 10 Most selling ram models")
plt.ylabel("Ram Model")
plt.xlabel("Rams sold")

In [None]:
tmp=df_dropped.groupby('ram_model')['time_code'].count().sort_values(ascending=False)
tmp=tmp.tail(10)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("10 less selling ram models")
plt.ylabel("Ram Model")
plt.xlabel("Rams sold")

In [None]:
tmp=df_dropped.groupby('memory_dim')['time_code'].count().sort_values(ascending=False)
tmp=tmp.head(10)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Top 10 Most popular memory dimensions")
plt.ylabel("Memory Dims")
plt.xlabel("Rams sold")

In [None]:
tmp=df_dropped.groupby('memory_dim')['time_code'].count().sort_values(ascending=False)
tmp=tmp.tail(8)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("8 Least popular memory dimensions")
plt.ylabel("Memory Dims")
plt.xlabel("Rams sold")

In [None]:
tmp=df_dropped.groupby('memory_type')['time_code'].count().sort_values(ascending=False)
#tmp=tmp.head(10)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Most popular memory technologies")
plt.ylabel("Memory Type")
plt.xlabel("Rams sold")

In [None]:
tmp=df_dropped.groupby('clock')['time_code'].count().sort_values(ascending=False)
tmp=tmp.head(10)
tmp.plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Top 10 Most popular clock frequencies")
plt.ylabel("Clock")
plt.xlabel("Rams sold")

In [None]:
x = "continent"
y = "memory_type"

plt.figure(figsize = (15,11))
df_dropped.groupby(y)[x].nunique().plot.barh(stacked=True)
#df_dropped.groupby("memory_type")["brand"].nunique().plot.barh(stacked=True)
#plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Group by " + y + " and counting on " + x)
plt.xlabel(x)
plt.ylabel(y)
plt.show()

In [None]:
for x in cat:
    for y in cat:
        if (x != y) & (x != "id_ram") & (x != "time_code") & (x != "ram_model"):
            plt.figure(figsize = (15,11))
            #df_dropped.groupby(x)[y].count().plot.barh(stacked=True)
            df_dropped.groupby(x)[y].nunique().plot.barh(stacked=True)
            #df_dropped.groupby(x)[y].value_counts().plot.barh(stacked=True)
            #plt.style.use('ggplot')
            plt.gca().invert_yaxis()
            plt.title("Group by " + x + " and counting on " + y)
            plt.xlabel(y)
            plt.ylabel(x)
            plt.show()

NON FUNZIONA

In [None]:
"""plt.figure(figsize = (20,10))
df1 = df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.month])["sales_usd"].sum()#.unstack()
print(df1)
df1["continent"].plot(kind="bar", title="test", stacked=True)
#df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.time_code.dt.month])["continent"].nunique().plot(kind="bar", title="test", stacked=True)
plt.xticks(rotation=90, horizontalalignment="center")
plt.title(x)
plt.xlabel("Group by Years and Months")
plt.ylabel("sum")
plt.show()"""

### Stacked bar charts (WORK but WITHOUT "Stacking")

In [None]:
df_dropped.groupby(df_dropped.time_code.dt.year).continent.value_counts().unstack(0).plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Number of purchases of ram per continent")
plt.ylabel("Continent")
plt.xlabel("Rams bought")

In [None]:
plt.rcParams["figure.figsize"] = [30,20]
tmp=df_dropped.groupby('vendor').continent.value_counts().sort_values(ascending=False)
tmp=tmp.head(15).unstack(0).plot.barh()
plt.style.use('ggplot')
plt.gca().invert_yaxis()
plt.title("Number of purchases of ram per continent")
plt.ylabel("Continent")
plt.xlabel("Rams bought")

In [None]:
plt.rcParams["figure.figsize"] = [30,20]
df1 = df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.time_code.dt.month,'continent']).count().unstack()
#plt.style.use('seaborn-dark-palette')
#plt.style.use('ggplot')
plt.style.use('fivethirtyeight')
#plt.style.use('seaborn-muted')

df1["time_code"].plot(kind="bar", stacked=True,sort_columns  =True)

#df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.time_code.dt.month])["continent"].nunique().plot(kind="bar", title="test", stacked=True)

plt.xticks(rotation=90, horizontalalignment="center")
plt.title("NotScaled")
plt.xlabel("Group by Years and Months")
plt.ylabel("sum")


#df1.plot(figsize=(20, 5))
#plt.savefig('Contxsell')
plt.show()

In [None]:
plt.rcParams["figure.figsize"] = [30,20]
df1 = df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.time_code.dt.month,'memory_type']).count().unstack()
#plt.style.use('Solarize_Light2')
#plt.style.use('seaborn-dark-palette')
plt.style.use('fivethirtyeight')

df1["time_code"].plot(kind="bar", stacked=True,sort_columns  =True)


#df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.time_code.dt.month])["continent"].nunique().plot(kind="bar", title="test", stacked=True)

plt.xticks(rotation=90, horizontalalignment="center")

plt.xlabel("Group by Years and Months")
plt.ylabel("sum")


#df1.plot(figsize=(20, 5))
#plt.savefig('Contxsell')
plt.show()

In [None]:
plt.rcParams["figure.figsize"] = [30,20]
for x in df_dropped.continent.unique():
    df2=df_dropped[df_dropped.continent==x]
    df1 = df_dropped.groupby([df2.time_code.dt.year , df2.time_code.dt.month,'memory_type']).count().unstack()
    plt.style.use('Solarize_Light2')
    plt.style.use('fivethirtyeight')

    df1["time_code"].plot(kind="bar", stacked=True,sort_columns  =True)


    #df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.time_code.dt.month])["continent"].nunique().plot(kind="bar", title="test", stacked=True)
    plt.title('Ram memory type popularity by year and month in '+x)
    plt.xticks(rotation=90, horizontalalignment="center")

    plt.xlabel("Group by Years and Months")
    plt.ylabel("sum")


    #df1.plot(figsize=(20, 5))
    #plt.savefig('Contxsell')
    plt.show()

In [None]:
plt.rcParams["figure.figsize"] = [30,20]
focus='memory_dim'
for x in df_dropped.continent.unique():
    df2=df_dropped[df_dropped.continent==x]
    df1 = df_dropped.groupby([df2.time_code.dt.year , df2.time_code.dt.month,focus]).count().unstack()
    plt.style.use('Solarize_Light2')

    df1["time_code"].plot(kind="bar", stacked=True,sort_columns  =True)


    #df_dropped.groupby([df_dropped.time_code.dt.year , df_dropped.time_code.dt.month])["continent"].nunique().plot(kind="bar", title="test", stacked=True)
    plt.title('Ram memory type popularity by year and month in '+x)
    plt.xticks(rotation=90, horizontalalignment="center")

    plt.xlabel("Group by Years and Months")
    plt.ylabel("sum")
    plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.125),
          ncol=int(df_dropped[focus].nunique()/4), fancybox=True, shadow=True)


    #df1.plot(figsize=(20, 5))
    #plt.savefig('Contxsell')
    plt.show()

### Pie charts (WORK)

In [None]:
#---------pie charts 
sns.set(font_scale=1.5)
class_series = df_dropped.groupby('brand').size().sort_values(ascending=False)
class_series.plot.pie(autopct='%.2f', figsize = (12,12), fontsize=(14), shadow = True)
plt.title('Vendors Distribution')
plt.savefig('D:\\Dropbox\\Scuola\\Pisa\\Anno2\\Data Mining\\Esame\\Pratica\\python\\NewDataset\\vendors-pie')
plt.show()

In [None]:
sns.set(font_scale=1.5)
class_series = df_dropped.groupby('memory_type').size()
class_series.plot.pie(autopct='%.2f', figsize = (12,12), fontsize=(14), shadow = True)
plt.title('Memory Types Distribution')

In [None]:
sns.set(font_scale=1.5)
class_series = df_dropped.groupby('continent').size()
class_series.name = 'Continents Distribution'
class_series.plot.pie(autopct='%.2f', figsize = (12,12), fontsize=(14), shadow = True)
plt.title('Continents')

plt.show()

### Donut charts (WORK)

In [None]:
plt.rcParams["figure.figsize"] = [8,8]
plt.rc('font', size=10)
df=df_dropped
layers = ['continent', 'country']
value = 'amount'
minimum_value = 2 # Skip values less than minimum

df['amount'] = [1 for x in range(len(df))]
city = df['country'].value_counts()
city = city[city > minimum_value]
df_big = df.loc[df['country'].isin(city.index)]   # reject small cities

def percentage_growth(l):
    s = 0
    res = [0]
    for i in range(len(l)-1):
        s += l[i]
        res.append(s / sum(l))
    return res
plt.axis("equal")

cmap = plt.get_cmap("rainbow")

for i, layer in enumerate(layers):
    radius = i + 2
    width = 1
    frame = df_big.groupby(layers[:i+1])['amount'].sum()
    colors = cmap(percentage_growth(frame))
    labels = [x[-1] if isinstance(x, tuple) else x for x in frame.index.to_numpy()]
    plt.pie(frame, labels=labels, colors=colors, radius=radius, wedgeprops=dict(width=width, edgecolor='w'), labeldistance=0.7)

In [None]:
plt.rcParams["figure.figsize"] = [8,8]
plt.rc('font', size=12)
df=df_dropped
layers = ['country','region']
value = 'amount'
minimum_value = 2 # Skip values less than minimum

df['amount'] = [1 for x in range(len(df))]
city = df['region'].value_counts()
city = city[city > minimum_value]
df_big = df.loc[df['region'].isin(city.index)]   # reject small cities

def percentage_growth(l):
    s = 0
    res = [0]
    for i in range(len(l)-1):
        s += l[i]
        res.append(s / sum(l))
    return res
plt.axis("equal")

cmap = plt.get_cmap("rainbow")
df_bigOld=df_big
for x in df.continent.unique():
    df_big=df_bigOld[df_bigOld.continent==x]
    for i, layer in enumerate(layers):
        radius = i + 2
        width = 1
        frame = df_big.groupby(layers[:i+1])['amount'].sum()
        colors = cmap(percentage_growth(frame))
        labels = [x[-1] if isinstance(x, tuple) else x for x in frame.index.to_numpy()]
        plt.pie(frame, labels=labels, colors=colors, radius=radius, wedgeprops=dict(width=width, edgecolor='w'), labeldistance=0.8)
    plt.show()

### Scatter Plot
----------scatter plot 2D e 3D	quantitative
"""use density plots or plots based on binning."""

#### 2D (WORK)

In [None]:
#-----------2D

#option1
for x in df_dropped.columns:
    for y in df_dropped.columns:
        if x != y:
            plt.figure(figsize = (15,11))
            print("\nScatter plot 2D for " + x + " and " + y + " ::")
            plt.scatter(df_dropped[x], df_dropped[y], s=5)
            plt.show()
            print()

In [None]:
#option3 
for x in df_dropped:
    for y in df_dropped:
        if (x != "continent") & (y != "continent") & (x != y):
            plt.rcParams.update({'font.size': 7})
            plt.rcParams["figure.figsize"] = (15, 10)
            plt.scatter(df_dropped[df_dropped['continent'] == 'Europe'][x], df_dropped[df_dropped['continent'] == 'Europe'][y], s=10, color='b', label='Europe')
            plt.scatter(df_dropped[df_dropped['continent'] == 'Oceania'][x], df_dropped[df_dropped['continent'] == 'Oceania'][y], s=100, color='r', marker='*', label='Oceania')
            plt.scatter(df_dropped[df_dropped['continent'] == 'America'][x], df_dropped[df_dropped['continent'] == 'America'][y], s=70, color='y', marker='+', label='America')
            plt.xlabel(x)
            plt.ylabel(y)
            plt.legend(scatterpoints=1,
                loc='lower left',
                ncol=1,
                fontsize=10)
            #plt.savefig('D:\\Dropbox\\Scuola\\Pisa\\Anno2\\Data Mining\\Esame\\Pratica\\python\\NewDataset\\scatter2.jpg')
            plt.show()

#### Matrix (WORK)

In [None]:
#--------MATRIX
#option ALL-IN --------------
"""pd.plotting.scatter_matrix(df_join, marker='+', figsize = (20,20))
plt.show()"""

#option2
#sns.set_theme(style="ticks")
sns.pairplot(df_dropped, hue="continent")

In [None]:
"""def corrdot(*args, **kwargs):
    corr_r = args[0].corr(args[1], 'pearson')
    corr_text = round(corr_r, 2)
    ax = plt.gca()
    font_size = abs(corr_r) * 80 + 5
    ax.annotate(corr_text, [.5, .5,],  xycoords="axes fraction", ha='center', va='center', fontsize=font_size)

def corrfunc(x, y, **kws):
    r, p = stats.pearsonr(x, y)
    p_stars = ''
    if p <= 0.05:
        p_stars = '*'
    if p <= 0.01:
        p_stars = '**'
    if p <= 0.001:
        p_stars = '*'
    ax = plt.gca()
    ax.annotate(p_stars, xy=(0.65, 0.6), xycoords=ax.transAxes,
                color='red', fontsize=70)

sns.set(style='white', font_scale=1.6)

g = sns.PairGrid(df_dropped, aspect=1.5, diag_sharey=False, despine=False)
g.map_lower(sns.regplot, lowess=True, ci=False,
            line_kws={'color': 'red', 'lw': 1},
            scatter_kws={'color': 'black', 's': 20})
g.map_diag(sns.distplot, color='black',
           kde_kws={'color': 'red', 'cut': 0.7, 'lw': 1},
           hist_kws={'histtype': 'bar', 'lw': 2,
                     'edgecolor': 'k', 'facecolor':'grey'})
g.map_diag(sns.rugplot, color='black')
g.map_upper(corrdot)
g.map_upper(corrfunc)
g.fig.subplots_adjust(wspace=0, hspace=0)

# Remove axis labels
for ax in g.axes.flatten():
    ax.set_ylabel('')
    ax.set_xlabel('')

# Add titles to the diagonal axes/subplots
for ax, col in zip(np.diag(g.axes), df_dropped.columns):
    ax.set_title(col, y=0.82, fontsize=26)"""

### Multiple Line charts

In [None]:
# Set up a grid of plots:
fig = plt.figure(figsize=(25, 5))
#fig_dims = (2, 1)
fig.subplots_adjust(hspace=0.7)

# # Plot mean sales 
# plt.subplot2grid(fig_dims, (0, 0))
# monthly_sales_mean = df_join.groupby([df_join.year, df_join.month]).mean().plot()
# plt.title('Mean value of Sale over the months: ', fontsize='x-large')
# plt.xlabel('Year-Month', fontsize='x-large')
# plt.ylabel('Mean Sale', fontsize='x-large')
# plt.tick_params(labelsize='x-large')

# Plot total sales
#plt.subplot2grid(fig_dims, (1, 0))
monthly_sales_total = df_dropped[['sales_usd','sales_currency']].groupby([df_dropped.time_code.dt.year,df_dropped.month]).sum().plot()
plt.title('Total value of Sale over the months: ', fontsize='small')
plt.xlabel('Year-Month', fontsize='small')
plt.ylabel('Sum of Sale', fontsize='small')
plt.tick_params(labelsize='small')

plt.show()

In [None]:
plt.rcParams["figure.figsize"] = [30, 10]
plt.style.use('Solarize_Light2')

legend=[]
for x in df_join.time_code.dt.year.unique():
    tmp=df_join[df_join.time_code.dt.year ==x]
    for y in tmp.time_code.dt.month.unique():
        legend.append((x,y))
array=df_join.columns
print(legend)
for focus in array:
    if focus != 'time_code' and focus != 'year'and focus != 'id_ram'and focus != 'month'and focus != 'week' and focus != "sales_usd" and focus != "sales_currency" and focus != "currency" :
        if focus == 'continent':
            max_val=3
        else:
            max_val=5
        
        print("PLOTTING FOR: "+focus)

        plt.figure(figsize=(20, 5))
        tmp1=df_join.groupby(focus)[focus].count()
        tmp1=tmp1.sort_values(ascending=False)
        test=tmp1.index.values.tolist()
        plt.title("Distribution of "+focus)
        plt.xlim([0,len(legend)])
        plt.xlabel("Year and month")
        plt.ylabel("Sales")
       
        plt.xticks(np.arange(len(legend)),legend)
        plt.tick_params(labelrotation=90, labelsize='small')

        #print('legend len:'+str(len(legend)))

        for x in test[0:max_val]:
            tmp=df_join[df_join[focus]==x]
            #tmp.groupby([df_join.year, df_join.month])[focus].count().plot()
            tmp2=tmp.groupby([df_join.time_code.dt.year, df_join.month])[focus].count().reset_index(name='count')
            minyear=tmp2.time_code.min()
            minmonth=tmp2[tmp2.time_code==minyear]
            minmonth=minmonth.month.min()
            index=(minyear,minmonth)
            num_index=legend.index(index)
            values=tmp.groupby([df_join.time_code.dt.year, df_join.month])[focus].count().values.tolist()
            #print(values)
            #print(num_index)
            plt.plot(np.arange(num_index,len(values)+num_index),values)
            #print(len(tmp.groupby([df_join.year, df_join.month])[focus].count().values.tolist()))

        plt.legend(test[0:max_val])
        plt.show()


## Data Quality improvement

determine the data quality/accuracy	
- explanation on how treat outliers
- quality/accuracy of attributes


form of aggregation	
- reduction/sampling

-Reduction of attributes, Some removal

In [None]:
df_dropped.drop(['month'], inplace=True, axis=1)

discretization & binarization	(maybe move at the begin of clustering)

Attribute transformation	logaritmic

-PCA
 siccome droppo gli attributi categorici allora mi resta solo sales currency come numerico (o al massimo conversion rates)

In [None]:
#-----------------PCA 
df_num = df_dropped.drop(columns=cat)
df_temp = df_num.sample(5000)
df_temp = StandardScaler().fit_transform(df_temp)
pca = PCA(n_components=1)
x_pca = pca.fit_transform(df_temp)

print("original shape: ", df_temp.shape)
print("transformed shape:", x_pca.shape)

x_new = pca.inverse_transform(x_pca)

plt.figure(figsize = (15,7))
plt.style.use('ggplot')
plt.scatter(df_temp[:, 0], df_temp[:, 1], alpha=0.2, s=20, c="r")
plt.scatter(x_new[:, 0], x_new[:, 1] , s=20, alpha=0.8, c="b")  
plt.title("PCA")
plt.xlabel("1st eigenvector")
plt.ylabel("2nd eigenvector")
plt.show()

print(pca.explained_variance_ratio_)
"""By using the attribute explained_variance_ratio_, 
you can see that the first principal component contains 72.77% of the variance
 and the second principal component contains 23.03% of the variance. 
 Together, the two components contain 95.80% of the information."""

## Data Transformation
possible transformation::
- aggiungo tasso di cambio per capire come sono variati i prezzi in base alla fluttuazione della valuta


In [None]:
price_conversion=[]
tmp1=df_dropped.sales_usd.tolist()
tmp2=df_dropped.sales_currency.tolist()
for index,x in enumerate(tmp1):
    price_conversion.append(x/tmp2[index])
df_dropped['conversion_rate']=price_conversion
df_dropped["conversion_rate"].head()

In [None]:
df_dropped.drop(["sales_currency"], inplace=True, axis=1)

In [None]:
legend=[]
for x in df_join.time_code.dt.year.unique():
    tmp=df_join[df_join.time_code.dt.year ==x]
    for y in tmp.time_code.dt.month.unique():
        legend.append((x,y))

focus='conversion_rate'
print("PLOTTING FOR: "+focus)

plt.style.use('Solarize_Light2')
test=df_dropped.currency.unique()
#print(test)
plt.title("Distribution of "+focus)
plt.xlim([0,len(legend)])
plt.xlabel("Year and month")
plt.ylabel("Conversion Rate ")

plt.xticks(np.arange(len(legend)),legend)
plt.tick_params(labelrotation=90, labelsize='small')

#print('legend len:'+str(len(legend)))

for x in test:
    tmp=df_dropped[df_dropped['currency']==x]
    #tmp.groupby([df_dropped.year, df_dropped.month])[focus].count().plot()
    tmp2=tmp.groupby([df_dropped.time_code.dt.year, df_dropped.month]).mean()
    minyear=tmp2.index.get_level_values('time_code').min()
    minmonth=tmp2[tmp2.index.get_level_values('time_code')==minyear]
    minmonth=minmonth.index.get_level_values('month').min()
    index=(minyear,minmonth)
    num_index=legend.index(index)
    values=tmp2['conversion_rate'].tolist()
    #print(values)
    #print(num_index)
    plt.plot(np.arange(num_index,len(values)+num_index),values)
    #print(len(tmp.groupby([df_dropped.year, df_dropped.month])[focus].count().values.tolist()))
    
print_legend=[]
for x in test:
    print_legend.append('USD/'+x)
plt.legend(print_legend)
plt.show()

## New Dataset/Feature extraction

### BY VENDORS

In [None]:
VENDORs = df_dropped['vendor'].unique()
vs = pd.DataFrame(VENDORs)
vs.set_index(0, inplace=True)
vs.index.names = ['vendor']

vs_cat = vs.copy()

#### Total money for each continent

In [None]:
test=df_dropped[df_dropped.continent=="Europe"].groupby(df_dropped.vendor)['sales_usd'].count()
vs['sales_Europe']=test
vs['sales_Europe'] = vs['sales_Europe'].fillna(0)

test=df_dropped[df_dropped.continent=="Oceania"].groupby(df_dropped.vendor)['sales_usd'].count()
vs['sales_Oceania']=test
vs['sales_Oceania'] = vs['sales_Oceania'].fillna(0)

test=df_dropped[df_dropped.continent=="America"].groupby(df_dropped.vendor)['sales_usd'].count()
vs['sales_America']=test
vs['sales_America'] = vs['sales_America'].fillna(0)


print(vs)

#### I: total number of ram (id_ram) purchased by each vendor

In [None]:
I = df_dropped.groupby(df_dropped['vendor'])["id_ram"].count()

I = pd.DataFrame(I)

# add column to the new dataset
vs['I'] = I["id_ram"]

print(vs)

#### Iu: number of distinct items purchased by the vendor

In [None]:
Iu = df_dropped.groupby(df_dropped['vendor'])["id_ram"].nunique()

Iu = pd.DataFrame(Iu)

# add column to the new dataset
vs['Iu'] = Iu["id_ram"]

print(vs)

#### Imax: maximum number of items purchased by the vendor within a single shopping session

In [None]:
Imax = df_dropped.groupby(['vendor', "time_code"])["id_ram"].count().max(level="vendor")

"""for x in df_dropped["time_code"].unique():
    df_temp = df_dropped[(df_dropped["vendor"]=="ARLT") & (df_dropped["time_code"]==x)]
    if len(df_temp) > 45:
        print(df_temp)
        print()
        print(df_temp.describe())"""

#print(Imax)
Imax = pd.DataFrame(Imax)

#print(Imax)
# add column to the new dataset
vs['Imax'] = Imax["id_ram"]

print(vs)

#### Imin: minimum number of items purchased by the vendor within a single shopping session

In [None]:
Imin = df_dropped.groupby(['vendor', "time_code"])["id_ram"].count().min(level="vendor")

"""for x in df_dropped["time_code"].unique():
    df_temp = df_dropped[(df_dropped["vendor"]=="ARLT") & (df_dropped["time_code"]==x)]
    if len(df_temp) > 45:
        print(df_temp)
        print()
        print(df_temp.describe())"""

#print(Imax)
Imin = pd.DataFrame(Imin)

#print(Imin)
# add column to the new dataset
vs['Imin'] = Imin["id_ram"]

print(vs)

#### Iavg: average number of items purchased by the vendor within a single shopping session

In [None]:
Iavg = df_dropped.groupby(['vendor', "time_code"])["id_ram"].count().mean(level="vendor")

#print(Iavg)
Iavg = pd.DataFrame(Iavg)

#print(Iavg)
# add column to the new dataset
vs['Iavg'] = Iavg["id_ram"]

print(vs)

#### Ep: Shannon's Entropy on the types of products purchased by the vendor (id_ram)

In [None]:
# Shannon entropy on the purchasing behaviour of the customer
def estimate_shannon_entropy(values):
    m = len(values)
    IDs = collections.Counter([value for value in values])
    shannon_entropy_value = 0
    for ID in IDs:
        # number of residues
        n_i = IDs[ID]
        # n_i (# residues type i) / M (# residues in column)
        p_i = n_i / float(m)
        entropy_i = p_i * (math.log(p_i, 2))
        shannon_entropy_value += entropy_i
    if shannon_entropy_value == 0:
        return 0
    return shannon_entropy_value * -1

Ep = df_dropped.groupby('vendor')["id_ram"].apply(estimate_shannon_entropy)

#print(Ep)
# create dataframe
Ep = pd.DataFrame(Ep)

#print(Ep)
# add column to the new dataset
vs['Ep'] = Ep

print(vs)

#### Eb: Shannon's Entropy on the frequency and extent of the vendor's shopping sessions (time_code)

In [None]:
Eb = df_dropped.groupby('vendor')["time_code"].apply(estimate_shannon_entropy)

#print(Eb)
# create dataframe
Eb = pd.DataFrame(Eb)

#print(Eb)
# add column to the new dataset
vs['Eb'] = Eb

print(vs)

#### Ew: Shannon's Entropy on the weekday of the vendor's purchases (time_code.dt.day_name)

In [None]:
Ew = df_dropped.copy()
Ew['time_code'] = Ew.time_code.dt.day_name(locale = 'English')
Ew = Ew.groupby('vendor')["time_code"].apply(estimate_shannon_entropy)

#print(Ew)
# create dataframe
Ew = pd.DataFrame(Ew)

#print(Ew)
# add column to the new dataset
vs['Ew'] = Ew

print(vs)

#### Em: Shannon's Entropy on the month of the vendor's purchases (time_code.dt.month_name)

In [None]:
Em = df_dropped.copy()
Em['time_code'] = Em.time_code.dt.month_name(locale = 'English')
Em = Em.groupby('vendor')["time_code"].apply(estimate_shannon_entropy)

#print(Em)
# create dataframe
Em = pd.DataFrame(Em)

#print(Em)
# add column to the new dataset
vs['Em'] = Em

print(vs)

#### Stot: total amount spent by each vendor (USD)

In [None]:
Stot = df_join.groupby('vendor')["sales_usd"].sum()

#print(Stot)
Stot = pd.DataFrame(Stot)

#print(Stot)
# add column to the new dataset
vs['Stot_USD'] = Stot["sales_usd"]

print(vs)

#### Smax: Maximum amout spent by each vendor within a single shopping session

In [None]:
Smax = df_dropped.groupby(['vendor', "time_code"])["sales_usd"].sum().max(level="vendor")

#print(Smax)
Smax = pd.DataFrame(Smax)

#print(Smax)
# add column to the new dataset
vs['Smax_USD'] = Smax["sales_usd"]

print(vs)

#### Savg: average amount spent by each vendor within a single shopping session

In [None]:
Savg = df_dropped.groupby(['vendor', "time_code"])["sales_usd"].sum().mean(level="vendor")

#print(Savg)
Savg = pd.DataFrame(Savg)

#print(Savg)
# add column to the new dataset
vs['Savg_USD'] = Savg["sales_usd"]

print(vs)

#### SWmax: Maximum amout spent by each vendor within a week

In [None]:
SWmax = df_dropped.groupby(['vendor', df_dropped["time_code"].dt.week])["sales_usd"].sum().max(level="vendor")

#print(SWmax)
SWmax = pd.DataFrame(SWmax)

#print(SWmax)
# add column to the new dataset
vs['SWmax_USD'] = SWmax["sales_usd"]

print(vs)

#### SWavg: Average amount spent by each vendor within a week

In [None]:
SWavg = df_dropped.groupby(['vendor', df_dropped["time_code"].dt.week])["sales_usd"].sum().mean(level="vendor")

#print(SWavg)
SWavg = pd.DataFrame(SWavg)

#print(SWavg)
# add column to the new dataset
vs['SWavg_USD'] = SWavg["sales_usd"]

print(vs)

#### SMmax: Maximum amout spent by each vendor within a month

In [None]:
SMmax = df_dropped.groupby(['vendor', df_dropped.time_code.dt.month])["sales_usd"].sum().max(level="vendor")

#print(SMmax)
SMmax = pd.DataFrame(SMmax)

#print(SMmax)
# add column to the new dataset
vs['SMmax_USD'] = SMmax["sales_usd"]

print(vs)

#### SMavg: Average amout spent by each vendor within a month

In [None]:
SMavg = df_dropped.groupby(['vendor', df_dropped.time_code.dt.month])["sales_usd"].sum().mean(level="vendor")

#print(SMavg)
SMavg = pd.DataFrame(SMavg)

#print(SMavg)
# add column to the new dataset
vs['SMavg_USD'] = SMavg["sales_usd"]

print(vs)

#### NSess: number of shopping sessions

In [None]:
NSess = df_dropped.groupby('vendor')["time_code"].nunique()

#print(NSess)
NSess = pd.DataFrame(NSess)

#print(NSess)
# add column to the new dataset
vs['NSess'] = NSess["time_code"]

print(vs)

#### Cont_Max: number of shopping sessions

In [None]:
Cont_Max = df_dropped.groupby(df_dropped['vendor'])["continent"].nunique()

Cont_Max = pd.DataFrame(Cont_Max)

# add column to the new dataset
vs['Cont_Max'] = Cont_Max["continent"]
vs['Cont_Max'] = vs['Cont_Max'].fillna(0)

print(vs)

#### Dim_unik: number of shopping sessions

In [None]:
Dim_unik = df_dropped.groupby(df_dropped['vendor'])["memory_dim"].nunique()

Dim_unik = pd.DataFrame(Dim_unik)

# add column to the new dataset
vs['Dim_unik'] = Dim_unik["memory_dim"]
vs['Dim_unik'] = vs['Dim_unik'].fillna(0)

print(vs)

#### Type_unik: number of shopping sessions

In [None]:
Type_unik = df_dropped.groupby(df_dropped['vendor'])["memory_type"].nunique()

Type_unik = pd.DataFrame(Type_unik)

# add column to the new dataset
vs['Type_unik'] = Type_unik["memory_type"]
vs['Type_unik'] = vs['Type_unik'].fillna(0)

print(vs)

#### clok_unik: number of shopping sessions

In [None]:
clok_unik = df_dropped.groupby(df_dropped['vendor'])["clock"].nunique()

clok_unik = pd.DataFrame(clok_unik)

# add column to the new dataset
vs['clok_unik'] = clok_unik["clock"]
vs['clok_unik'] = vs['clok_unik'].fillna(0)

print(vs)

#### coun_unik: number of shopping sessions

In [None]:
coun_unik = df_dropped.groupby(df_dropped['vendor'])["country"].nunique()

coun_unik = pd.DataFrame(coun_unik)

# add column to the new dataset
vs['coun_unik'] = coun_unik["country"]
vs['coun_unik'] = vs['coun_unik'].fillna(0)

print(vs)

#### reg_unik: number of shopping sessions

In [None]:
reg_unik = df_dropped.groupby(df_dropped['vendor'])["region"].nunique()

reg_unik = pd.DataFrame(reg_unik)

# add column to the new dataset
vs['reg_unik'] = reg_unik["region"]
vs['reg_unik'] = vs['reg_unik'].fillna(0)

print(vs)

#### day_unik: number of shopping sessions

In [None]:
df_with_day = df_dropped.copy()
df_with_day['time_code'] = df_with_day.time_code.dt.day

day_unik = df_with_day.groupby(df_with_day['vendor'])["time_code"].nunique()

day_unik = pd.DataFrame(day_unik)

# add column to the new dataset
vs['day_unik'] = day_unik["time_code"]
vs['day_unik'] = vs['day_unik'].fillna(0)

print(vs)

#### week_unik: number of shopping sessions

In [None]:
df_with_week = df_dropped.copy()
df_with_week['time_code'] = df_with_week.time_code.dt.week

week_unik = df_with_week.groupby(df_with_week['vendor'])["time_code"].nunique()

week_unik = pd.DataFrame(week_unik)

# add column to the new dataset
vs['week_unik'] = week_unik["time_code"]
vs['week_unik'] = vs['week_unik'].fillna(0)

print(vs)

#### price_unik: number of shopping sessions

In [None]:
price_unik = df_dropped.groupby(df_dropped['vendor'])["sales_usd"].nunique()

price_unik = pd.DataFrame(price_unik)

# add column to the new dataset
vs['price_unik'] = price_unik["sales_usd"]
vs['price_unik'] = vs['price_unik'].fillna(0)

print(vs)

#### Rate_avg: Average amout spent by each vendor within a month

In [None]:
Rate_avg = df_dropped.groupby(df_dropped['vendor'])["conversion_rate"].mean()

Rate_avg = pd.DataFrame(Rate_avg)

# add column to the new dataset
vs['Rate_avg'] = Rate_avg
vs['Rate_avg'] = vs['Rate_avg'].fillna(0)

print(vs)

#### Categorical Attributes

##### Country: Country associated with the majority of the vendor's transactions

In [None]:
Country = df_dropped.groupby('vendor').country.unique()

print(Country[0])
print(Country[1])
print(Country)
# create dataframe
Country = pd.DataFrame(Country.apply(lambda x: x[0]))

print(Country)
# add column to the new dataset
vs_cat['Country'] = Country

print(vs_cat)

##### Fav_weekday: day of the week during which the customer tends to spend the most

In [None]:
df_with_day_week = df_dropped.copy()
df_with_day_week['time_code'] = df_with_day_week.time_code.dt.day_name(locale = 'English')

spent_per_day_week = df_with_day_week.groupby(['vendor', 'time_code']).sales_usd.sum()
Fav_weekday = {}
for row in spent_per_day_week.iteritems():
    if row[0][0] not in Fav_weekday:
        Fav_weekday[row[0][0]] = row[0][1]

# create dataframe
Fav_weekday = pd.DataFrame.from_dict(Fav_weekday, orient='index')

# add column to the new dataset
vs_cat['Fav_weekday'] = Fav_weekday

print(vs_cat)

##### fav_month: month during which the customer tends to spend the most

In [None]:
df_with_months = df_dropped.copy()
df_with_months['time_code'] = df_with_months.time_code.dt.month_name(locale = 'English')

spent_per_month = df_with_months.groupby(['vendor', 'time_code']).sales_usd.sum()
Fav_month = {}
for row in spent_per_month.iteritems():
    if row[0][0] not in Fav_month:
        Fav_month[row[0][0]] = row[0][1]

# create dataframe
Fav_month = pd.DataFrame.from_dict(Fav_month, orient='index')

# add column to the new dataset
vs_cat['Fav_month'] = Fav_month

print(vs_cat)

##### fav_dim: month during which the customer tends to spend the most

In [None]:
fav_dim = df_dropped.groupby(['vendor', 'memory_dim'])['memory_dim'].count()

Fav = {}
for x in fav_dim.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in fav_dim.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]                            

#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['fav_dim'] = Fav
vs_cat['fav_dim'] = vs_cat['fav_dim'].fillna(0)

print(vs_cat)

In [None]:
fav_dim = df_dropped.groupby(['vendor', 'memory_type'])['memory_type'].count().reset_index(name="count")
tmp = fav_dim[fav_dim["vendor"]=="geizhals_unknown"]
print(tmp)

##### fav_price: month during which the customer tends to spend the most

In [None]:
fav_price = df_dropped.groupby(['vendor', 'sales_usd'])['sales_usd'].count()

Fav = {}
for x in fav_price.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in fav_price.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]    


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['fav_price'] = Fav
vs_cat['fav_price'] = vs_cat['fav_price'].fillna(0)

print(vs_cat)

##### fav_model: month during which the customer tends to spend the most

In [None]:
fav_model = df_dropped.groupby(['vendor', 'ram_model'])['ram_model'].count()

Fav = {}
for x in fav_model.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in fav_model.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['fav_model'] = Fav
vs_cat['fav_model'] = vs_cat['fav_model'].fillna(0)

print(vs_cat)

##### fav_mem_type: month during which the customer tends to spend the most

In [None]:
fav_mem_type = df_dropped.groupby(['vendor', 'memory_type'])['memory_type'].count()

Fav = {}
for x in fav_mem_type.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in fav_mem_type.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['fav_mem_type'] = Fav
vs_cat['fav_mem_type'] = vs_cat['fav_mem_type'].fillna(0)

print(vs_cat)

##### fav_clock: month during which the customer tends to spend the most

In [None]:
fav_clock = df_dropped.groupby(['vendor', 'clock'])['clock'].count()

Fav = {}
for x in fav_clock.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in fav_clock.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['fav_clock'] = Fav
vs_cat['fav_clock'] = vs_cat['fav_clock'].fillna(0)

print(vs_cat)

##### fav_continent: month during which the customer tends to spend the most

In [None]:
fav_continent = df_dropped.groupby(['vendor', 'continent'])['continent'].count()

Fav = {}
for x in fav_continent.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in fav_continent.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['fav_continent'] = Fav
vs_cat['fav_continent'] = vs_cat['fav_continent'].fillna(0)

print(vs_cat)

##### fav_region: month during which the customer tends to spend the most

In [None]:
fav_region = df_dropped.groupby(['vendor', 'region'])['region'].count()

Fav = {}
for x in fav_region.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in fav_region.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['fav_region'] = Fav
vs_cat['fav_region'] = vs_cat['fav_region'].fillna(0)

print(vs_cat)

##### Conv_most: month during which the customer tends to spend the most

In [None]:
Conv_most = df_dropped.groupby(['vendor', 'conversion_rate'])['conversion_rate'].count()

Fav = {}
for x in Conv_most.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in Conv_most.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['Conv_most'] = Fav
vs_cat['Conv_most'] = vs_cat['Conv_most'].fillna(0)

print(vs_cat)

##### Conv_less: month during which the customer tends to spend the most

In [None]:
Conv_less = df_dropped.groupby(['vendor', 'conversion_rate'])['conversion_rate'].count()

Fav = {}
for x in Conv_less.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in Conv_less.iteritems():
            if tmp > y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['Conv_less'] = Fav
vs_cat['Conv_less'] = vs_cat['Conv_less'].fillna(0)

print(vs_cat)

##### Conv_max: month during which the customer tends to spend the most

In [None]:
Conv_max = df_dropped.groupby('vendor')["conversion_rate"].max()

Conv_max = pd.DataFrame(Conv_max)

# add column to the new dataset
vs_cat['Conv_max'] = Conv_max
vs_cat['Conv_max'] = vs_cat['Conv_max'].fillna(0)

print(vs_cat)

##### Conv_min: month during which the customer tends to spend the most

In [None]:
Conv_min = df_dropped.groupby('vendor')["conversion_rate"].min()

Conv_min = pd.DataFrame(Conv_min)

# add column to the new dataset
vs_cat['Conv_min'] = Conv_min
vs_cat['Conv_min'] = vs_cat['Conv_min'].fillna(0)

print(vs_cat)

##### fav_week: month during which the customer tends to spend the most

In [None]:
df_with_week = df_dropped.copy()
df_with_week['time_code'] = df_with_week.time_code.dt.week

fav_week = df_with_week.groupby(['vendor', "time_code"]).time_code.count()

Fav = {}
for x in fav_week.iteritems():
    if x[0][0] not in Fav:
        tmp= x[1]
        Fav[x[0][0]] = x[0][1]
        for y in fav_week.iteritems():
            if tmp > y[1]:
                if x[0][0] == y[0][0]:
                    tmp = y[1]
                    Fav[y[0][0]] = y[0][1]



#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
vs_cat['fav_week'] = Fav
vs_cat['fav_week'] = vs_cat['fav_week'].fillna(0)

print(vs_cat)

#### Statistics on the Vendor dataset

In [None]:
vs = vs[(vs.I != 0)]
vs_cat = vs_cat[(vs_cat.fav_continent != 0)]


In [None]:
# stats on new dataset XXX
print(vs.describe())
print()
print(vs_cat.describe())
print()

In [None]:
# data distribution
mpl.rc('figure', max_open_warning = 0)
sturge_number = math.trunc(np.log2(len(vs)))
for col in vs:  
    print("\n\nCONSIDERATIONS ABOUT THE ATTRIBUTE: " + col)
    vs[col].hist(bins = sturge_number + 1)  #Sturges' rule
    pl.suptitle(col)    
    #plt.savefig(dir+'\\Histogram\\'+col+'-hist.jpg')
    plt.xticks(rotation=45)
    plt.figure(figsize = (10,8))
    plt.show()
    print()

In [None]:
# visualization on categorical
for x in vs_cat:  
    plt.bar(vs_cat[x].unique(),vs_cat[x].value_counts())
    plt.xticks(rotation=90, horizontalalignment="center")
    plt.title("Distribution of " + x)
    plt.xlabel(x)
    plt.ylabel("Number of customers")
    plt.show()
    print()

In [None]:
tmp1= vs.copy()
tmp2= vs_cat.copy()

In [None]:
# outliers
for col in vs:  
    fig, ax = plt.subplots()
    ax.set_title('\nOutliers of ' +col+ ' in the Dataset')
    ax.boxplot(vs[col])
    plt.show()

    print("---------------------------------------------------------------")
    
def iqr_values(s):
    q1 = s.quantile(q = 0.25)

    q3 = s.quantile(q = 0.75)    

    iqr = q3 - q1

    iqr_left = q1 - 3*iqr
    
    iqr_right = q3 + 3*iqr
    
    return iqr_left, iqr_right


for x in vs:
    if (x != "Imin") & (x != "coun_unik"):
        print("OUTLIERS REPRESENTATION FOR ATTRIBUTE :\t" + x)
        left_sale, right_sale = iqr_values(vs[x])
        vs[(vs[x] > left_sale) & (vs[x] < right_sale)][x].plot.box()
        plt.show()
        print()
        outliers = vs[(vs[x] < left_sale) | (vs[x] > right_sale)]
        print("\n Outliers founded")
        print()
        print(outliers.describe())
        print()
        outliers.drop_duplicates(inplace=True)
        print("--------------------------------------------------")
        print("\n Dataset dropped")
        #vs_dropped[x] = vs[x][(vs[x] > left_sale) & (vs[x] < right_sale)]
        vs.drop(outliers.index, inplace=True, errors="ignore")
        vs_cat.drop(outliers.index, inplace=True)
        print(vs.describe())
        print()


In [None]:
print("\nCorrelation Matrix with Spearman::")
corr = vs.corr(method='spearman')
sns.set(font_scale=0.8)
#plt.figure(figsize = (7,7))
ax = sns.heatmap(corr, vmin=-0.1, vmax=1, linewidths=1, annot=True)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right')   #Look the method
plt.rc('figure', figsize=(18, 7))
plt.show()


# correlation
print("\nCorrelation Matrix with Pearson::")
#plt.rc('figure', figsize=(18, 7))
corr = vs.corr(method='pearson')
sns.set(font_scale=0.8)
#plt.figure(figsize = (7,7))
ax = sns.heatmap(corr, vmin=-0.1, vmax=1, linewidths=1, annot=True)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right')   #Look the method
plt.rc('figure', figsize=(18, 7))
plt.show()


print("\nHeatmap correlation::")
sns.heatmap(vs.corr(), annot=True);

"""for x in vs:
    for y in vs:
        if x != y:
            plt.figure(figsize = (15,15))
            plt.scatter(vs[x], vs[y])
            plt.xlabel(x)
            plt.ylabel(y)
            plt.title('\nCorrelation between ' + x + ' and ' + y + ' in vs')
            plt.show()"""

#---- maintain attributes below a certain threshold
high_corr = []
print(high_corr)
threshold = 0.80
list_corr = list(corr.to_numpy())
ext_ind = 0
for i in list_corr:
    int_ind = 0
    for j in i:
        if j > threshold and int_ind != ext_ind:
            high_corr.append(int_ind)
        int_ind += 1
    ext_ind += 1
print("Attributes above threshold of " + str(threshold) + " are:\n")
print(list(vs.columns[high_corr]))


vs.drop(columns=list(vs.columns[high_corr]), inplace=True, axis=1)

# correlation
print("\nCorrelation Matrix with Pearson::")
corr = vs.corr(method='pearson')
sns.set(font_scale=0.8)
#plt.figure(figsize = (7,7))
ax = sns.heatmap(corr, vmin=-0.1, vmax=1, linewidths=1, annot=True)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right')   #Look the method
plt.rc('figure', figsize=(18, 7))
plt.show()

In [None]:
#vs = tmp1.copy()
#vs_cat = tmp2.copy()

### BY BRAND & VENDORS

In [None]:
multind=pd.MultiIndex.from_product([df_dropped.vendor.unique().tolist(),  df_dropped.brand.unique().tolist()])
bvs = pd.DataFrame(index = multind, columns = ['Total_gain'])
bvs_cat = bvs.copy()
bvs_cat.drop(labels='Total_gain', inplace=True, axis=1)

test=df_join.groupby([df_dropped.vendor,df_dropped.brand])['sales_usd'].sum()
bvs['Total_gain']=test
bvs['Total_gain'] = bvs['Total_gain'].fillna(0)
"""bvs.set_index(0, inplace=True)
vs.index.names = ['vendor']"""

#### Total money for each continent

In [None]:
test=df_dropped[df_dropped.continent=="Europe"].groupby([df_dropped.vendor,df_dropped.brand])['sales_usd'].count()
bvs['sales_Europe']=test
bvs['sales_Europe'] = bvs['sales_Europe'].fillna(0)

test=df_dropped[df_dropped.continent=="Oceania"].groupby([df_dropped.vendor,df_dropped.brand])['sales_usd'].count()
bvs['sales_Oceania']=test
bvs['sales_Oceania'] = bvs['sales_Oceania'].fillna(0)

test=df_dropped[df_dropped.continent=="America"].groupby([df_dropped.vendor,df_dropped.brand])['sales_usd'].count()
bvs['sales_America']=test
bvs['sales_America'] = bvs['sales_America'].fillna(0)


print(bvs)

#### I: total number of ram (id_ram) purchased by each vendor

In [None]:
I = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["id_ram"].count()

I = pd.DataFrame(I)

# add column to the new dataset
bvs['I'] = I["id_ram"]
bvs['I'] = bvs['I'].fillna(0)

print(bvs)

#### Iu: number of distinct items purchased by the vendor

In [None]:
Iu = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["id_ram"].nunique()

Iu = pd.DataFrame(Iu)

# add column to the new dataset
bvs['Iu'] = Iu["id_ram"]
bvs['Iu'] = bvs['Iu'].fillna(0)

print(bvs)

#### Imax: maximum number of items purchased by the vendor within a single shopping session

In [None]:
Imax = df_dropped.groupby(['vendor', "brand", "time_code"])["id_ram"].count()

Imax_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: Imax_list.append(Imax.loc[vendor,brand].max())
        except: Imax_list.append(0)

# add column to the new dataset
bvs['Imax'] = Imax_list
bvs['Imax'] = bvs['Imax'].fillna(0)

print(bvs)

#### Imin: minimum number of items purchased by the vendor within a single shopping session

In [None]:
Imin = df_dropped.groupby(['vendor', "brand", "time_code"])["id_ram"].count()

Imin_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: Imin_list.append(Imin.loc[vendor,brand].min())
        except: Imin_list.append(0)

# add column to the new dataset
bvs['Imin'] = Imin_list
bvs['Imin'] = bvs['Imin'].fillna(0)

print(bvs)

#### Iavg: average number of items purchased by the vendor within a single shopping session

In [None]:
Iavg = df_dropped.groupby(['vendor', "brand", "time_code"])["id_ram"].count()

Iavg_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: Iavg_list.append(Iavg.loc[vendor,brand].mean())
        except: Iavg_list.append(0)

# add column to the new dataset
bvs['Iavg'] = Iavg_list
bvs['Iavg'] = bvs['Iavg'].fillna(0)

print(bvs)

#### Ep: Shannon's Entropy on the types of products purchased by the vendor (id_ram)

In [None]:
# Shannon entropy on the purchasing behaviour of the customer
def estimate_shannon_entropy(values):
    m = len(values)
    IDs = collections.Counter([value for value in values])
    shannon_entropy_value = 0
    for ID in IDs:
        # number of residues
        n_i = IDs[ID]
        # n_i (# residues type i) / M (# residues in column)
        p_i = n_i / float(m)
        entropy_i = p_i * (math.log(p_i, 2))
        shannon_entropy_value += entropy_i
    if shannon_entropy_value == 0:
        return 0
    return shannon_entropy_value * -1

Ep = df_dropped.groupby(['vendor', "brand"])["id_ram"].apply(estimate_shannon_entropy)

#print(Ep)
# create dataframe
Ep = pd.DataFrame(Ep)

#print(Ep)
# add column to the new dataset
bvs['Ep'] = Ep
bvs['Ep'] = bvs['Ep'].fillna(0)

print(bvs)

#### Eb: Shannon's Entropy on the frequency and extent of the vendor's shopping sessions (time_code)

In [None]:
Eb = df_dropped.groupby(['vendor', "brand"])["time_code"].apply(estimate_shannon_entropy)

#print(Eb)
# create dataframe
Eb = pd.DataFrame(Eb)

#print(Eb)
# add column to the new dataset
bvs['Eb'] = Eb
bvs['Eb'] = bvs['Eb'].fillna(0)

print(bvs)

#### Ew: Shannon's Entropy on the weekday of the vendor's purchases (time_code.dt.day_name)

In [None]:
Ew = df_dropped.copy()
Ew['time_code'] = Ew.time_code.dt.day_name(locale = 'English')
Ew = Ew.groupby(['vendor', "brand"])["time_code"].apply(estimate_shannon_entropy)

#print(Ew)
# create dataframe
Ew = pd.DataFrame(Ew)

#print(Ew)
# add column to the new dataset
bvs['Ew'] = Ew
bvs['Ew'] = bvs['Ew'].fillna(0)

print(bvs)

#### Em: Shannon's Entropy on the month of the vendor's purchases (time_code.dt.month_name)

In [None]:
Em = df_dropped.copy()
Em['time_code'] = Em.time_code.dt.month_name(locale = 'English')
Em = Em.groupby(['vendor', "brand"])["time_code"].apply(estimate_shannon_entropy)

#print(Em)
# create dataframe
Em = pd.DataFrame(Em)

#print(Em)
# add column to the new dataset
bvs['Em'] = Em
bvs['Em'] = bvs['Em'].fillna(0)

print(bvs)

#### Stot: total amount spent by each vendor (USD)

In [None]:
Stot = df_join.groupby(['vendor', "brand"])["sales_usd"].sum()

#print(Stot)
Stot = pd.DataFrame(Stot)

#print(Stot)
# add column to the new dataset
bvs['Stot_USD'] = Stot["sales_usd"]
bvs['Stot_USD'] = bvs['Stot_USD'].fillna(0)

print(bvs)

#### Smax: Maximum amout spent by each vendor within a single shopping session

In [None]:
Smax = df_dropped.groupby(['vendor', "brand", "time_code"])["sales_usd"].sum()

Smax_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: Smax_list.append(Smax.loc[vendor,brand].max())
        except: Smax_list.append(0)

# add column to the new dataset
bvs['Smax'] = Smax_list
bvs['Smax'] = bvs['Smax'].fillna(0)

print(bvs)

#### Savg: average amount spent by each vendor within a single shopping session

In [None]:
Savg = df_dropped.groupby(['vendor', "brand", "time_code"])["sales_usd"].sum()

Savg_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: Savg_list.append(Savg.loc[vendor,brand].mean())
        except: Savg_list.append(0)

# add column to the new dataset
bvs['Savg'] = Savg_list
bvs['Savg'] = bvs['Savg'].fillna(0)

print(bvs)

#### SWmax: Maximum amout spent by each vendor within a week

In [None]:
SWmax = df_dropped.groupby(['vendor', "brand", df_dropped["time_code"].dt.week])["sales_usd"].sum()

SWmax_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: SWmax_list.append(SWmax.loc[vendor,brand].max())
        except: SWmax_list.append(0)

# add column to the new dataset
bvs['SWmax'] = SWmax_list
bvs['SWmax'] = bvs['SWmax'].fillna(0)

print(bvs)

#### SWavg: Average amount spent by each vendor within a week

In [None]:
SWavg = df_dropped.groupby(['vendor', "brand", df_dropped["time_code"].dt.week])["sales_usd"].sum()

SWavg_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: SWavg_list.append(SWavg.loc[vendor,brand].mean())
        except: SWavg_list.append(0)

# add column to the new dataset
bvs['SWavg'] = SWavg_list
bvs['SWavg'] = bvs['SWavg'].fillna(0)

print(bvs)

#### SMmax: Maximum amout spent by each vendor within a month

In [None]:
SMmax = df_dropped.groupby(['vendor', "brand", df_dropped["time_code"].dt.month])["sales_usd"].sum()

SMmax_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: SMmax_list.append(SMmax.loc[vendor,brand].max())
        except: SMmax_list.append(0)

# add column to the new dataset
bvs['SMmax'] = SMmax_list
bvs['SMmax'] = bvs['SMmax'].fillna(0)

print(bvs)

#### SMavg: Average amout spent by each vendor within a month

In [None]:
SMavg = df_dropped.groupby(['vendor', "brand", df_dropped["time_code"].dt.month])["sales_usd"].sum()

SMavg_list = []
for vendor in df_dropped.vendor.unique():
    for brand in df_dropped.brand.unique():
        try: SMavg_list.append(SMavg.loc[vendor,brand].mean())
        except: SMavg_list.append(0)

# add column to the new dataset
bvs['SMavg'] = SMavg_list
bvs['SMavg'] = bvs['SMavg'].fillna(0)

print(bvs)

#### NSess: number of shopping sessions

In [None]:
NSess = df_dropped.groupby(['vendor', "brand"])["time_code"].nunique()

#print(NSess)
NSess = pd.DataFrame(NSess)

#print(NSess)
# add column to the new dataset
bvs['NSess'] = NSess["time_code"]
bvs['NSess'] = bvs['NSess'].fillna(0)

print(bvs)

#### Cont_Max: number of shopping sessions

In [None]:
Cont_Max = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["continent"].nunique()

Cont_Max = pd.DataFrame(Cont_Max)

# add column to the new dataset
bvs['Cont_Max'] = Cont_Max["continent"]
bvs['Cont_Max'] = bvs['Cont_Max'].fillna(0)

print(bvs)

#### Dim_unik: number of shopping sessions

In [None]:
Dim_unik = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["memory_dim"].nunique()

Dim_unik = pd.DataFrame(Dim_unik)

# add column to the new dataset
bvs['Dim_unik'] = Dim_unik["memory_dim"]
bvs['Dim_unik'] = bvs['Dim_unik'].fillna(0)

print(bvs)

#### Type_unik: number of shopping sessions

In [None]:
Type_unik = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["memory_type"].nunique()

Type_unik = pd.DataFrame(Type_unik)

# add column to the new dataset
bvs['Type_unik'] = Type_unik["memory_type"]
bvs['Type_unik'] = bvs['Type_unik'].fillna(0)

print(bvs)

#### clok_unik: number of shopping sessions

In [None]:
clok_unik = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["clock"].nunique()

clok_unik = pd.DataFrame(clok_unik)

# add column to the new dataset
bvs['clok_unik'] = clok_unik["clock"]
bvs['clok_unik'] = bvs['clok_unik'].fillna(0)

print(bvs)

#### coun_unik: number of shopping sessions

In [None]:
coun_unik = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["country"].nunique()

coun_unik = pd.DataFrame(coun_unik)

# add column to the new dataset
bvs['coun_unik'] = coun_unik["country"]
bvs['coun_unik'] = bvs['coun_unik'].fillna(0)

print(bvs)

#### reg_unik: number of shopping sessions

In [None]:
reg_unik = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["region"].nunique()

reg_unik = pd.DataFrame(reg_unik)

# add column to the new dataset
bvs['reg_unik'] = reg_unik["region"]
bvs['reg_unik'] = bvs['reg_unik'].fillna(0)

print(bvs)

#### day_unik: number of shopping sessions

In [None]:
df_with_day = df_dropped.copy()
df_with_day['time_code'] = df_with_day.time_code.dt.day

day_unik = df_with_day.groupby([df_with_day['vendor'], df_with_day['brand']])["time_code"].nunique()

day_unik = pd.DataFrame(day_unik)

# add column to the new dataset
bvs['day_unik'] = day_unik["time_code"]
bvs['day_unik'] = bvs['day_unik'].fillna(0)

print(bvs)

#### week_unik: number of shopping sessions

In [None]:
df_with_week = df_dropped.copy()
df_with_week['time_code'] = df_with_week.time_code.dt.week

week_unik = df_with_week.groupby([df_with_week['vendor'], df_with_week['brand']])["time_code"].nunique()

week_unik = pd.DataFrame(week_unik)

# add column to the new dataset
bvs['week_unik'] = week_unik["time_code"]
bvs['week_unik'] = bvs['week_unik'].fillna(0)

print(bvs)

#### price_unik: number of shopping sessions

In [None]:
price_unik = df_dropped.groupby([df_dropped['vendor'], df_dropped['brand']])["sales_usd"].nunique()

price_unik = pd.DataFrame(price_unik)

# add column to the new dataset
bvs['price_unik'] = price_unik["sales_usd"]
bvs['price_unik'] = bvs['price_unik'].fillna(0)

print(bvs)

#### Rate_avg: Average amout spent by each vendor within a month

In [None]:
Rate_avg = df_dropped.groupby(['vendor', "brand"])["conversion_rate"].mean()

Rate_avg = pd.DataFrame(Rate_avg)

# add column to the new dataset
bvs['Rate_avg'] = Rate_avg
bvs['Rate_avg'] = bvs['Rate_avg'].fillna(0)

print(bvs)

#### Categorical Attributes

##### Country: Country associated with the majority of the vendor's transactions

In [None]:
fav_country = df_dropped.groupby(['vendor', "brand", 'country'])['country'].count()

Fav = {}
for x in fav_country.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_country.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2]                            

#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_country'] = Fav
bvs_cat['fav_country'] = bvs_cat['fav_country'].fillna(0)

print(bvs_cat)

##### Fav_weekday: day of the week during which the customer tends to spend the most

In [None]:
df_with_day_week = df_dropped.copy()
df_with_day_week['time_code'] = df_with_day_week.time_code.dt.day_name(locale = 'English')

spent_per_day_week = df_with_day_week.groupby(['vendor', "brand", 'time_code']).sales_usd.sum()
Fav = {}
for x in spent_per_day_week.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in spent_per_day_week.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')

# add column to the new dataset
bvs_cat['Fav_weekday'] = Fav
bvs_cat['Fav_weekday'] = bvs_cat['Fav_weekday'].fillna(0)

print(bvs_cat)

##### fav_month: month during which the customer tends to spend the most

In [None]:
df_with_months = df_dropped.copy()
df_with_months['time_code'] = df_with_months.time_code.dt.month_name(locale = 'English')

spent_per_month = df_with_months.groupby(['vendor', "brand", 'time_code']).sales_usd.sum()

Fav = {}
for x in spent_per_month.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in spent_per_month.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')

# add column to the new dataset
bvs_cat['Fav_month'] = Fav
bvs_cat['Fav_month'] = bvs_cat['Fav_month'].fillna(0)

print(bvs_cat)

##### fav_dim: month during which the customer tends to spend the most

In [None]:
fav_dim = df_dropped.groupby(['vendor', "brand", 'memory_dim'])['memory_dim'].count()

Fav = {}
for x in fav_dim.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_dim.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2]                            

#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_dim'] = Fav
bvs_cat['fav_dim'] = bvs_cat['fav_dim'].fillna(0)

print(bvs_cat)

In [None]:
fav_dim = df_dropped.groupby(['vendor', "brand", 'memory_type'])['memory_type'].count().reset_index(name="count")
tmp = fav_dim[(fav_dim["vendor"]=="geizhals_unknown") & (fav_dim["brand"]=="APACER")]
print(tmp)

##### fav_price: month during which the customer tends to spend the most

In [None]:
fav_price = df_dropped.groupby(['vendor', "brand", 'sales_usd'])['sales_usd'].count()

Fav = {}
for x in fav_price.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_price.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_price'] = Fav
bvs_cat['fav_price'] = bvs_cat['fav_price'].fillna(0)

print(bvs_cat)

##### fav_model: month during which the customer tends to spend the most

In [None]:
fav_model = df_dropped.groupby(['vendor', "brand", 'ram_model'])['ram_model'].count()

Fav = {}
for x in fav_model.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_model.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_model'] = Fav
bvs_cat['fav_model'] = bvs_cat['fav_model'].fillna(0)

print(bvs_cat)

##### fav_mem_type: month during which the customer tends to spend the most

In [None]:
fav_mem_type = df_dropped.groupby(['vendor', "brand", 'memory_type'])['memory_type'].count()

Fav = {}
for x in fav_mem_type.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_mem_type.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_mem_type'] = Fav
bvs_cat['fav_mem_type'] = bvs_cat['fav_mem_type'].fillna(0)

print(bvs_cat)

##### fav_clock: month during which the customer tends to spend the most

In [None]:
fav_clock = df_dropped.groupby(['vendor', "brand", 'clock'])['clock'].count()

Fav = {}
for x in fav_clock.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_clock.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_clock'] = Fav
bvs_cat['fav_clock'] = bvs_cat['fav_clock'].fillna(0)

print(bvs_cat)

##### fav_continent: month during which the customer tends to spend the most

In [None]:
fav_continent = df_dropped.groupby(['vendor', "brand", 'continent'])['continent'].count()

Fav = {}
for x in fav_continent.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_continent.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_continent'] = Fav
bvs_cat['fav_continent'] = bvs_cat['fav_continent'].fillna(0)

print(bvs_cat)

##### fav_region: month during which the customer tends to spend the most

In [None]:
fav_region = df_dropped.groupby(['vendor', "brand", 'region'])['region'].count()

Fav = {}
for x in fav_region.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_region.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_region'] = Fav
bvs_cat['fav_region'] = bvs_cat['fav_region'].fillna(0)

print(bvs_cat)

##### Conv_most: month during which the customer tends to spend the most

In [None]:
Conv_most = df_dropped.groupby(['vendor', "brand", 'conversion_rate'])['conversion_rate'].count()

Fav = {}
for x in Conv_most.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in Conv_most.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['Conv_most'] = Fav
bvs_cat['Conv_most'] = bvs_cat['Conv_most'].fillna(0)

print(bvs_cat)

##### Conv_less: month during which the customer tends to spend the most

In [None]:
Conv_less = df_dropped.groupby(['vendor', "brand", 'conversion_rate'])['conversion_rate'].count()

Fav = {}
for x in Conv_less.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in Conv_less.iteritems():
            if tmp > y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['Conv_less'] = Fav
bvs_cat['Conv_less'] = bvs_cat['Conv_less'].fillna(0)

print(bvs_cat)

##### Conv_max: month during which the customer tends to spend the most

In [None]:
Conv_max = df_dropped.groupby(['vendor', "brand"])["conversion_rate"].max()

Conv_max = pd.DataFrame(Conv_max)

# add column to the new dataset
bvs_cat['Conv_max'] = Conv_max
bvs_cat['Conv_max'] = bvs_cat['Conv_max'].fillna(0)

print(bvs_cat)

##### Conv_min: month during which the customer tends to spend the most

In [None]:
Conv_min = df_dropped.groupby(['vendor', "brand"])["conversion_rate"].min()

Conv_min = pd.DataFrame(Conv_min)

# add column to the new dataset
bvs_cat['Conv_min'] = Conv_min
bvs_cat['Conv_min'] = bvs_cat['Conv_min'].fillna(0)

print(bvs_cat)

##### fav_week: month during which the customer tends to spend the most

In [None]:
df_with_week = df_dropped.copy()
df_with_week['time_code'] = df_with_week.time_code.dt.week

fav_week = df_with_week.groupby(['vendor', "brand", "time_code"]).time_code.count()

Fav = {}
for x in fav_week.iteritems():
    if (x[0][0], x[0][1]) not in Fav:
        tmp= x[1]
        Fav[(x[0][0], x[0][1])] = x[0][2]
        for y in fav_week.iteritems():
            if tmp < y[1]:
                if x[0][0] == y[0][0]:
                    if x[0][1] == y[0][1]:
                        tmp = y[1]
                        Fav[(y[0][0], y[0][1])] = y[0][2] 


#print(Fav)

# create dataframe
Fav = pd.DataFrame.from_dict(Fav, orient='index')


# add column to the new dataset
bvs_cat['fav_week'] = Fav
bvs_cat['fav_week'] = bvs_cat['fav_week'].fillna(0)

print(bvs_cat)

#### Statistics on the Vendor dataset

In [None]:
bvs = bvs[(bvs.I != 0)]
bvs_cat = bvs_cat[(bvs_cat.fav_continent != 0)]


In [None]:
# stats on new dataset XXX
print(bvs.describe())
print()
print(bvs_cat.describe())
print()

print(bvs.info())
print()
print(bvs_cat.info())


In [None]:
# data distribution
mpl.rc('figure', max_open_warning = 0)
sturge_number = math.trunc(np.log2(len(bvs)))
for col in bvs:  
    print("\n\nCONSIDERATIONS ABOUT THE ATTRIBUTE: " + col)
    bvs[col].hist(bins = sturge_number + 1)  #Sturges' rule
    pl.suptitle(col)    
    #plt.savefig(dir+'\\Histogram\\'+col+'-hist.jpg')
    plt.xticks(rotation=45)
    plt.figure(figsize = (10,8))
    plt.show()
    print()

In [None]:
# visualization on categorical
for x in bvs_cat:  
    plt.bar(bvs_cat[x].unique(),bvs_cat[x].value_counts())
    plt.xticks(rotation=90, horizontalalignment="center")
    plt.title("Distribution of " + x)
    plt.xlabel(x)
    plt.ylabel("Number of customers")
    plt.show()
    print()

In [None]:
tmp3= bvs.copy()
tmp4= bvs_cat.copy()

In [None]:
# outliers
for col in bvs:  
    fig, ax = plt.subplots()
    ax.set_title('\nOutliers of ' +col+ ' in the Dataset')
    ax.boxplot(bvs[col])
    plt.show()

    print("---------------------------------------------------------------")
    
def iqr_values(s):
    q1 = s.quantile(q = 0.25)

    q3 = s.quantile(q = 0.75)    

    iqr = q3 - q1

    iqr_left = q1 - 3*iqr
    
    iqr_right = q3 + 3*iqr
    
    return iqr_left, iqr_right


for x in bvs:
    if (x != "Imin") & (x != "Type_unik"):
        print("OUTLIERS REPRESENTATION FOR ATTRIBUTE :\t" + x)
        left_sale, right_sale = iqr_values(bvs[x])
        print(left_sale)
        print(right_sale)
        bvs[(bvs[x] > left_sale) & (bvs[x] < right_sale)][x].plot.box()
        plt.show()
        print(bvs)
        outliers = bvs[(bvs[x] < left_sale) | (bvs[x] > right_sale)]
        print("\n Outliers founded")
        print()
        print(outliers.describe())
        print()
        outliers.drop_duplicates(inplace=True)
        print("--------------------------------------------------")
        print("\n Dataset dropped")
        #vs_dropped[x] = bvs[x][(bvs[x] > left_sale) & (bvs[x] < right_sale)]
        bvs.drop(outliers.index, inplace=True, errors="ignore")
        bvs_cat.drop(outliers.index, inplace=True)
        print(bvs.describe())
        print()


In [None]:
print("\nCorrelation Matrix with Spearman::")
corr = bvs.corr(method='spearman')
sns.set(font_scale=0.8)
#plt.figure(figsize = (7,7))
ax = sns.heatmap(corr, vmin=-0.1, vmax=1, linewidths=1, annot=True)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right')   #Look the method
plt.rc('figure', figsize=(18, 7))
plt.show()


# correlation
print("\nCorrelation Matrix with Pearson::")
#plt.rc('figure', figsize=(18, 7))
corr = bvs.corr(method='pearson')
sns.set(font_scale=0.8)
#plt.figure(figsize = (7,7))
ax = sns.heatmap(corr, vmin=-0.1, vmax=1, linewidths=1, annot=True)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right')   #Look the method
plt.rc('figure', figsize=(18, 7))
plt.show()


print("\nHeatmap correlation::")
sns.heatmap(bvs.corr(), annot=True);

"""for x in bvs:
    for y in bvs:
        if x != y:
            plt.figure(figsize = (15,15))
            plt.scatter(bvs[x], bvs[y])
            plt.xlabel(x)
            plt.ylabel(y)
            plt.title('\nCorrelation between ' + x + ' and ' + y + ' in vs')
            plt.show()"""

#---- maintain attributes below a certain threshold
high_corr = []
print(high_corr)
threshold = 0.80
list_corr = list(corr.to_numpy())
ext_ind = 0
for i in list_corr:
    int_ind = 0
    for j in i:
        if j > threshold and int_ind != ext_ind:
            high_corr.append(int_ind)
        int_ind += 1
    ext_ind += 1
print("Attributes above threshold of " + str(threshold) + " are:\n")
print(list(bvs.columns[high_corr]))




In [None]:
tmp = bvs.drop(columns=list(bvs.columns[high_corr]), axis=1)

# correlation
print("\nCorrelation Matrix with Pearson::")
corr = tmp.corr(method='pearson')
sns.set(font_scale=0.8)
#plt.figure(figsize = (7,7))
ax = sns.heatmap(corr, vmin=-0.1, vmax=1, linewidths=1, annot=True)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45,horizontalalignment='right')   #Look the method
plt.rc('figure', figsize=(18, 7))
plt.show()

In [None]:
#bvs = tmp3.copy()
#bvs_cat = tmp4.copy()

## Final Exports

In [None]:
vs_num = vs.copy()
vs_num.to_csv(path_or_buf=dir + '\\Data\\vs_num.csv')
vs_cat.to_csv(path_or_buf=dir + '\\Data\\vs_cat.csv')

bvs_num = tmp.copy()
bvs_num.to_csv(path_or_buf=dir + '\\Data\\bvs_num.csv')
bvs_cat.to_csv(path_or_buf=dir + '\\Data\\bvs_cat.csv')

df_dropped.to_csv(path_or_buf=dir + '\\Data\\df_dropped.csv')

############################################# END ######################################################

###### FUTURE CONSIDERATIONS
- RFM Analysis
RFM (Recency, Frequency, Monetary) analysis is a customer segmentation technique that uses past purchase behavior to divide customers into groups. RFM helps divide customers into various categories or clusters to
identify customers who are more likely to respond to promotions and also for future personalization services. Frequency is the number of orders for each customer; Recency is the number of days between present date and date of last purchase each customer; Monetary is the purchase price for each customer.


- leave out extreme values from the sample (for instance, the 3% smallest and the 3% largest values) for calculating and displaying the histogram, or one can deviate from the principle of bins of equal length


- Convert numerical values into standard units, especially if data from different sources (and different countries) are used.
