# Pandas Tutorial (One Shot)
---

In [1]:
import pandas as pd
import numpy as np

>#### Creating Series

Can create a series using python list, numpy arrays or dictionaries. A range of mathematical can also be applied to modify the series

In [26]:
# series by python list
list_1 = ['a','b','c','d']
list_1_labels = [1,2,3,4]
series_1 = pd.Series(data=list_1, index=list_1_labels)

# series by numpy array
sp_arr_1 = np.array([1,2,3,4])
series_2 = pd.Series(np_arr_1)

# series by dict
dict_1 = {'name':'ath', 'age':21, 'nationality':'indian'}
series_3 = pd.Series(dict_1, name='ath details')
series_3.name # returns the series name defined

# mathematical operations
series_2 + series_2
series_2 * series_2
np.exp(series_2)

ath details


0     2.718282
1     7.389056
2    20.085537
3    54.598150
dtype: float64

>#### Creating Dataframes

*`pd.dataframe()` parameters...*
- values
- row_labels
- column_labels

In [36]:
# array -> dataframe

# generate random numbers between 10 and 50 into a array with 3 rows and 3 cols
np_arr_2 = np.random.randint(10,50,size=(3,3))
df_1 = pd.DataFrame(np_arr_2,['Row1', 'Row2', 'Row3'], ['Col1', 'Col2', 'Col3'])
df_1

# dictionary -> dataframe
dict_2 = { 'col1': pd.Series([1,2,3], index=['row1', 'row2', 'row3']),
            'col2': pd.Series([1.,2.,3.,4.], index=['row1', 'row2', 'row3', 'row4'])
} 
df_2 = pd.DataFrame(dict_2)
df_2

Unnamed: 0,col1,col2
row1,1.0,1.0
row2,2.0,2.0
row3,3.0,3.0
row4,,4.0


>#### Editing and Retrieving Data

- Obtain columns using...
    - `dataframe.column_name` OR `dataframe['column_name']`

- Obtain rows and/or columns using...
    - `dataframe.loc[row_label,column_label]` (can also supply a range of values/index ranges; start and end labels are inclusive)

    - `dataframe.iloc[row_index,column_index]` (can also supply a range of values/index ranges; start indexs are inclusive, end labels are exclusive)

- Add new columns using `dataframe[<new_column>] = values` and new rows using `dataframe = dataframe.append(<new_row>)`
    - Note: If you want the row to be indexed using 0,1,2,3... add `ignore_index=True` in the `dataframe.append()` function
    - If you want to name the row manually, specify `name=<new_row_name>` in the `pd.Series()` argument

- Delete columns/rows by specifying the row/column name, axis (`axis=0`for deleting rows; `axis=1` for deleting columns) and `inplace=True` in the `dataframe.drop()` argument

In [74]:
# get the first column
df_1['Col1']
df_1.loc[:,'Col1']
df_1.iloc[:,0]

# get the first row
df_1.loc['Row1',:]
df_1.iloc[0,:]

# get multiple columns
df_1[['Col1','Col2']]
df_1.loc[:,['Col1', 'Col2']]
df_1.iloc[:,0:2]

# get rows & columns (by label)
df_1.loc['Row1':'Row3', 'Col1':'Col2']
df_1.loc[['Row1','Row3'], ['Col1','Col3']]

# get rows & columns (by index)
df_1.iloc[0:3, 0:3]
df_1.iloc[[0,2], [0,2]]

df_1.loc['Row1', 'Col2'] # returns a particular value

Unnamed: 0,Col1,Col3
Row1,24,25
Row3,26,29


In [97]:
# adding a new column
df_1['Col_Total'] = df_1['Col1'] + df_1['Col2'] + df_1['Col3']
df_1

# adding a new row
new_row = pd.Series(df_1.loc['Row1'] + df_1.loc['Row2'] + df_1.loc['Row3'], name='Row_Total')
df_1 = df_1.append(new_row)
df_1

Unnamed: 0,Col1,Col2,Col3,Col_Total
Row1,24,18,25,67
Row2,35,42,22,99
Row3,26,46,29,101
Row_Total,85,106,76,267


In [98]:
# deleting columns
df_1.drop('Col_Total', axis=1, inplace=True)

# deleting rows
df_1.drop('Row_Total', axis=0, inplace=True)

Unnamed: 0,Col1,Col2,Col3
Row1,24,18,25
Row2,35,42,22
Row3,26,46,29


In [103]:
# set index
df_1['Row_names'] = ['Row1','Row2','Row3']
df_1.set_index('Row_names', inplace=True)

# reset index
df_1.reset_index(inplace=True)
df_1.drop('Row_names', axis=1, inplace=True)

Unnamed: 0,Col1,Col2,Col3
0,24,18,25
1,35,42,22
2,26,46,29


>#### Conditional Selection

In [137]:
np_arr_3 = np.random.randint(1,10, size=(3,3))
df_3 = pd.DataFrame(np_arr_3, index=None, columns=['Col1','Col2','Col3'])
df_3

Unnamed: 0,Col1,Col2,Col3
0,3,5,3
1,5,4,6
2,8,3,4


In [186]:
# prints the dataframe where the values in first col satisfy the conditional 
df_3[df_3['Col1'] <= 5]

Unnamed: 0,Col1,Col2,Col3
0,3,5,3
1,5,4,6


>#### File Input/Output

- Pandas can work with...
    - .csv files: 
        - Read = `pd.read_csv(filename)`
        - Write = `dataframe.to_csv(filename)`
    - .xlsx files: 
        - Read = `pd.read_excel(filename)`
        - Write = `dataframe.to_excel(filename)`

    - Databases:
        ```
        import pymysql
        
        try:
            db_connection = pymysql.connect(db = database_name, user = user_name, passwd = password_name, host = 'localhost', port = port_num <check port num>)
            dataframe = pd.read_sql('SELECT * FROM database_name', con=db_connection)
            print(database_name)

        except Exception as e:
            print('Exception {}'.format(e))

        finally:
            db_connection.close()
        ```

>#### Dataframe basics

In [191]:
weather_df = pd.read_csv('nyc_weather_data.csv')

In [215]:
weather_df.head() # prints first 5 rows
weather_df.tail() # prints last 5 rows

# convert to numpy array
weather_df.to_numpy()

# get all row indices as an array
weather_df.index.array 

# add 1 to all values in a dataframe
df_3.transform(lambda x: x+1)

# add 2 to all values within 1st column 
# and subtract 1 from all vlues in second column
df_3.transform({'Col1': lambda x: x+2,
                'Col2': lambda x: x-1,
                'Col3': lambda x: x})




Unnamed: 0,Col1,Col2,Col3
0,5,4,3
1,7,3,6
2,10,2,4
