# Python - Basics Function

### Libraries in Python:

1. Scientifics:

    -     Pandas  (for: data structure/Dataframes/tools)  

    -     Numpy (for: array and matrices)    

    -     Scipy (for: integrals, solving differential equations, optimization)


2. Visualization:

    -     Matplotlib (for: plot and graphs)
    
    -     Seaborn (for: plot heat maps, time series, violin plot)


3. Algorithmix:

    -    Scikit-learn (for: machine learning)
    
    -     Starmodels (for: explore data, estimate statistical models, and perform statistical tests)


## Import/Export data in Python

For import data from website, we can use the comand `!wget https://'Path where the CSV file is stored\File name'`.Then use one of following comand:

#### Import: `pd.read_`:
>  **csv:**  pd.read_csv('Path where the CSV file is stored\File name.csv')
>
> **json:**  pd.read_json('Path where the CSV file is stored\File name.json')
>
> **excel:** pd.read_excel('Path where the CSV file is stored\File name.excel')
>
> **sql:**   pd.read_sql('Path where the CSV file is stored\File name.sql')

#### Export: `df.to_`:
>  **csv:**  df.to_csv('Path where the CSV file is stored\File name.csv')
>
> **json:**  df.to_json('Path where the CSV file is stored\File name.json')
>
> **excel:** df.to_excel('Path where the CSV file is stored\File name.excel')
>
> **sql:**   df.to_sql('Path where the CSV file is stored\File name.sql')

Syntax:

**`pd.read_csv('Path where the CSV file is stored\File name.csv', sep=';', header=’infer’, index_col=None)`**

where: 

- 'Path where the CSV file is stored\File name.csv' : where file is.
- sep = ';' : delimiter to use.
- delimiter = None : alternative argument name for sep.
- header = ’infer’ : row number(s) to use as the column names, and the start of the data.
- index_col = None : column to use as the row labels of the DataFrame.

In [3]:
# e.g.
import pandas as pd
df = pd.read_csv('Pokemon.csv', sep=';',header='infer', index_col = ['Name']) #import data
df

Unnamed: 0_level_0,Console,Year
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Pokémon Rosso e Verde,Game Boy,1996
Pokémon Rosso e Blu,Game Boy,1998
Pokémon Giallo,Game Boy,1998
Pokémon Oro e Argento,Game Boy Color,1999
Pokémon Cristallo,Game Boy Color,2000
Pokémon Rubino e Zaffiro,Game Boy Advance,2002
Pokémon Rosso Fuoco e Verde Foglia,Game Boy Advance,2004
Pokémon Smeraldo,Game Boy Advance,2004
Pokémon Diamante e Perla,Nintendo DS,2006
Pokémon Platino,Nintendo DS,2008


## Basic 

### 0. Convert into dataframe

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes. It consists of three principal components, the data, rows, and columns.

- data[ ]  : 1 bracket --> pandas series
- dta[[ ]] : 2 bracket --> pandas dataframe

In [4]:
data = pd.DataFrame(df, index = None, columns = ['Console','Year']) #convert into dataframe
data

Unnamed: 0_level_0,Console,Year
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Pokémon Rosso e Verde,Game Boy,1996
Pokémon Rosso e Blu,Game Boy,1998
Pokémon Giallo,Game Boy,1998
Pokémon Oro e Argento,Game Boy Color,1999
Pokémon Cristallo,Game Boy Color,2000
Pokémon Rubino e Zaffiro,Game Boy Advance,2002
Pokémon Rosso Fuoco e Verde Foglia,Game Boy Advance,2004
Pokémon Smeraldo,Game Boy Advance,2004
Pokémon Diamante e Perla,Nintendo DS,2006
Pokémon Platino,Nintendo DS,2008


### 1. Types

Pandas `dtypes` is used to view types of dataframe.

In [5]:
data.dtypes

Console    object
Year        int64
dtype: object

### 2. Describe

Pandas `describe( )` is used to view some basic statistical details like percentile, mean, std etc. of a data frame or a series of numeric values.

In [6]:
data.describe()

Unnamed: 0,Year
count,12.0
mean,2002.833333
std,4.725816
min,1996.0
25%,1998.75
50%,2003.0
75%,2006.5
max,2010.0


### 3. Printing the dataframe

To show the top (`head( )` ) and the bottom (`tail( )` ) of the database.

In [7]:
data.head() #Top 

Unnamed: 0_level_0,Console,Year
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Pokémon Rosso e Verde,Game Boy,1996
Pokémon Rosso e Blu,Game Boy,1998
Pokémon Giallo,Game Boy,1998
Pokémon Oro e Argento,Game Boy Color,1999
Pokémon Cristallo,Game Boy Color,2000


In [8]:
data.tail() #Bottom

Unnamed: 0_level_0,Console,Year
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Pokémon Smeraldo,Game Boy Advance,2004
Pokémon Diamante e Perla,Nintendo DS,2006
Pokémon Platino,Nintendo DS,2008
Pokémon Oro HeartGold e Argento SoulSilver,Nintendo DS,2009
Pokémon Nero e Bianco,Nintendo DS,2010


### 4. Information of dataframe

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, Pokémon Rosso e Verde to Pokémon Nero e Bianco 
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Console  12 non-null     object
 1   Year     12 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 288.0+ bytes


### 5. Remove missing data

`dropna( )` remove data that contain missing values.

- rows: **axis = 0**
- column: **axis = 1**

In [10]:
data.dropna(subset=['Console'], axis = 0, inplace = True)
data.head()

Unnamed: 0_level_0,Console,Year
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Pokémon Rosso e Verde,Game Boy,1996
Pokémon Rosso e Blu,Game Boy,1998
Pokémon Giallo,Game Boy,1998
Pokémon Oro e Argento,Game Boy Color,1999
Pokémon Cristallo,Game Boy Color,2000


### 6. Remove column

In [11]:
data.drop(data[['Console']], axis = 1, inplace = True)
data.head()

Unnamed: 0_level_0,Year
Name,Unnamed: 1_level_1
Pokémon Rosso e Verde,1996
Pokémon Rosso e Blu,1998
Pokémon Giallo,1998
Pokémon Oro e Argento,1999
Pokémon Cristallo,2000


### 7. Replace data

Sintax: 

`df.replace(old, new, count)`

In [12]:
txt = "I never play with Pokémon!"
x = txt.replace("never", "always")
print(x)

I always play with Pokémon!


In [13]:
data.rename(columns={'Year':'years'}, inplace = True)
data

Unnamed: 0_level_0,years
Name,Unnamed: 1_level_1
Pokémon Rosso e Verde,1996
Pokémon Rosso e Blu,1998
Pokémon Giallo,1998
Pokémon Oro e Argento,1999
Pokémon Cristallo,2000
Pokémon Rubino e Zaffiro,2002
Pokémon Rosso Fuoco e Verde Foglia,2004
Pokémon Smeraldo,2004
Pokémon Diamante e Perla,2006
Pokémon Platino,2008


### 8. Evaluating for Missing Data

In [14]:
missing_data = data.notnull() #or missing_data = data.isnull()
missing_data

Unnamed: 0_level_0,years
Name,Unnamed: 1_level_1
Pokémon Rosso e Verde,True
Pokémon Rosso e Blu,True
Pokémon Giallo,True
Pokémon Oro e Argento,True
Pokémon Cristallo,True
Pokémon Rubino e Zaffiro,True
Pokémon Rosso Fuoco e Verde Foglia,True
Pokémon Smeraldo,True
Pokémon Diamante e Perla,True
Pokémon Platino,True


### 9. Count data

In [15]:
count = data["years"].value_counts()
count

1998    2
2004    2
1999    1
1996    1
2010    1
2009    1
2008    1
2006    1
2002    1
2000    1
Name: years, dtype: int64

### 10. Change Type

In [16]:
avg = data["years"].astype("float")
avg

Name
Pokémon Rosso e Verde                          1996.0
Pokémon Rosso e Blu                            1998.0
Pokémon Giallo                                 1998.0
Pokémon Oro e Argento                          1999.0
Pokémon Cristallo                              2000.0
Pokémon Rubino e Zaffiro                       2002.0
Pokémon Rosso Fuoco e Verde Foglia             2004.0
Pokémon Smeraldo                               2004.0
Pokémon Diamante e Perla                       2006.0
Pokémon Platino                                2008.0
Pokémon Oro HeartGold e Argento SoulSilver     2009.0
Pokémon Nero e Bianco                          2010.0
Name: years, dtype: float64

### 11. Groupby

In [17]:
data.groupby('Name', as_index=True)
data

Unnamed: 0_level_0,years
Name,Unnamed: 1_level_1
Pokémon Rosso e Verde,1996
Pokémon Rosso e Blu,1998
Pokémon Giallo,1998
Pokémon Oro e Argento,1999
Pokémon Cristallo,2000
Pokémon Rubino e Zaffiro,2002
Pokémon Rosso Fuoco e Verde Foglia,2004
Pokémon Smeraldo,2004
Pokémon Diamante e Perla,2006
Pokémon Platino,2008


### 12. Dummies

Pandas `get_dummies( )` separate features into 2 o more unique category. 

In [18]:
import pandas as pd
# Create a dataframe
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 
        'sex': ['male', 'female', 'male', 'female', 'female']}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'sex'])
df

Unnamed: 0,first_name,last_name,sex
0,Jason,Miller,male
1,Molly,Jacobson,female
2,Tina,Ali,male
3,Jake,Milner,female
4,Amy,Cooze,female


In [19]:
# Create a set of dummy variables from the sex variable
pd.get_dummies(df, columns=['sex'])

Unnamed: 0,first_name,last_name,sex_female,sex_male
0,Jason,Miller,0,1
1,Molly,Jacobson,1,0
2,Tina,Ali,0,1
3,Jake,Milner,1,0
4,Amy,Cooze,1,0
