## Introduction to Pandas

Pandas is yet another library in Python that facilitates data manipulation and analysis. When in doubt, check the [DOCS](https://pandas.pydata.org/pandas-docs/stable/reference/index.html). <br><br>
Functions and methods to keep for life:
* .read_csv()
* .head()
* .columns
* display()
* pd.concat()
* .merge()
* .rename()

In [None]:
import pandas as pd

![pandas](https://media.giphy.com/media/zrdUjl6N99nLq/giphy.gif) 

In [None]:
# to import a file into a pandas DataFrame
df1 = pd.read_csv('file1.csv')

# to display the dataframe
df1

In [None]:
df1.head()

In [None]:
df2 = pd.read_csv('file2.txt', sep='\t') # sep is the parameter that determines by which character the document is being separated by
df2# displays the DataFrame keeping the nice format

In [None]:
df2.tail()

In [None]:
df1.columns # it is the same in df2

In [None]:
# if we want to join one DF under the other
data = pd.concat([df1, df2])
data

In [None]:
# What if we wanted to join two dataframes side by side?
# Hint: read the docs and observe the image below
data = pd.concat([df1, df2], axis=1)
data

![](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/1.3+-+Axes+Explain+-+Data.jpg)

In [None]:
# The same idea as the above (side by side), but using merge
df1.merge(df2, right_index=True, left_index=True) # when you have a column you want to match use the 'on' parameters

#### Merge VS Concat VS Join: to know more click [here](https://realpython.com/pandas-merge-join-and-concat/). <br><br>

In [None]:
# it is good practice to standardize your column names for it's practicality
# standardized column names will be meaninfull strings, lower cased and with '_' instead of 'spaces'

cols = []

for i in range(len(data.columns)):
    cols.append(data.columns[i].lower())
    
cols

In [None]:
# Extra Challenge: turn the for loop above into a list comprehension

In [None]:
data.columns = cols
data

In [None]:
# Another way of standardizing columns

data.columns = data.columns.str.upper()
data

#### Check out how I did that [here](https://towardsdatascience.com/using-string-methods-in-pandas-5e4509ff1f5f). <br><br>

In [None]:
# renaming columns
data = data.rename(columns={'controln':'id',
                            'hv1':'median_home_val',
                            'ic1':'median_household_income'})

data

### Additional operations


In [None]:
data = pd.read_csv('merged_clean_ver1.csv', index_col=0)

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
data.columns

In [None]:
data['gender'].value_counts()

In [None]:
data['gender'].unique()

In [None]:
data.shape

In [None]:
data.sample(frac=0.5)

![more](https://media.giphy.com/media/QoCoLo2opwUW4/giphy.gif)