# 1 Install pandas

`pip install pandas` in terminal

# 2 Import pandas

`import pandas as pd`

Import csv file `df=pd.read_csv('path/file.csv')`

return as a **dataframe**: rows and columns of data

**Several ways to glance at your dataframe**

`df.shape`

`df.info()` a method, that's why we need a parenthesis

*object usually means strings*

`df.head()` `df.tail()`

Pandas only show 20 columns as default, but if you want to change the setting.

To change the number of displayed columns: `pd.set_option('display.max_columns',85)`

To change the number of displayed rows: `pd.set_option('display.max_rows',85)`

# 3 Compare to dataframe in Python (without pandas)

In Python, we can create a dataframe directly by creating a dictionary of lists. 

Key is the column, value is the conetent in the columns, the corresponding list.

In [4]:
import pandas as pd

people = {
    'first':['Jane', 'John', 'Jing','Amy'],
    'last':['Doe', 'Glassman', 'Murfey','Anderson'],
    'email':['janedoe@gmail.com', 'glassman@gmail.com', 'murfey@gmail.com','amy@gmail.com']
}

In [5]:
people['first']

['Jane', 'John', 'Jing', 'Amy']

In [6]:
df_people=pd.DataFrame(people)
df_people

Unnamed: 0,first,last,email
0,Jane,Doe,janedoe@gmail.com
1,John,Glassman,glassman@gmail.com
2,Jing,Murfey,murfey@gmail.com
3,Amy,Anderson,amy@gmail.com


In [7]:
df_people['email']

0     janedoe@gmail.com
1    glassman@gmail.com
2      murfey@gmail.com
3         amy@gmail.com
Name: email, dtype: object

In [8]:
type(df_people['email'])

pandas.core.series.Series

return a series instead of a list: a rows of data, 1D dimension of data

Dataframe contains actually multiple series object.

In [None]:
df_people.email

#pass a list, therefore must have an inner bracket.
df_people[['first','email']]

`df.columns` 

`df.iloc[0]` - searching by *integer location*, passing an index(0) of row

`df.loc[]` - searching by the *label*

`df.loc[0:2,'first':'email']`

passing multiple rows - passing a list of index: `df.iloc[[0,1]]`

In [None]:
df_people.iloc[[0,1]]

In [None]:
#select the sepecific column
df_people.iloc[[0,1],1]

In [None]:
df_people.loc[[0,1],['last','email']]

In [None]:
df_people.loc[0:2,'first':'email']

count the number of each response: `df['Hobbyist'].value_counts()`

# How to set, reset and use Indexes

In [None]:
df_people.set_index('email') #do not change the original df
df_people.set_index('email', inplace=True) #change the original df index

In [None]:
df_people.index

In [None]:
df_people.loc['1'] #easy to search for the row

In [None]:
df_people.reset_index(inplace=True)

In [None]:
df_people

`df = pd.read_csv('path/file.csv', index_col='column_name')` setting index column while importing the file

`df.sort_index(ascending=False, inplace=True)` sort index alphabetically

# Filtering - Using conditionals to filter rows and columns


In [None]:
filt = (df_people['last'] == 'Doe')

#filt itselt is a series of boolean objects

In [None]:
df_people[filt]

# same as df_people[df_people['last'] == 'Doe']

df_people.loc[filt, 'email']

In [None]:
filt1 = ((df_people['last'] == 'Doe')&(df_people['first']=='Jane'))
df_people.loc[filt1]

In [None]:
filt2 = ((df_people['last'] == 'Doe')|(df_people['first']=='John'))
df_people.loc[filt2]

In [None]:
df_people.loc[~filt2] #~filter2 returns an opposite result

In [9]:
lastname=['Glassman','Doe','Anderson']
filt3=df_people['last'].isin(lastname)

In [11]:
df_people.loc[filt3, 'email']

0     janedoe@gmail.com
1    glassman@gmail.com
3         amy@gmail.com
Name: email, dtype: object

In [17]:
filt4=df_people['last'].str.contains('sm',na=False)
df_people.loc[filt4, 'email']

1    glassman@gmail.com
Name: email, dtype: object