# Data Analysis with Python & Pandas Notes
This set of notes will contain the essentials of Data Analysis with regards to manipulating data with **Python** & **Pandas**
## Table of Contents
1. Loading Data into Pandas
2. Reading Data in Pandas
3. Describing Data
4. Making Changes to Data

# 1. Loading Data into Pandas
We load the data into a DataFrame (df) and manipulate the data from this df.

To find out which directory you are on currently

In [None]:
pwd

In [None]:
import pandas as pd

#If your file is in csv format
df = pd.read_csv('pokemon_data.csv')

#If your file is in excel (xlsx) format
#df = pd.read_excel('pokemon_data.xlsx')

#If your file is in txt format (and data is separated by tabs - need to specify delimiter for this)
#df = pd.read_csv('pokemon_data.txt', delimiter = '\t')

In [None]:
print(df)

In [None]:
#To only show the top few rows of datasets

#print(df.head()) #default is top 5 rows.
print(df.head(3))

# 2. Reading Data in Pandas

In [None]:
#############################
# Read the Headers
#############################

print(df.columns()) #this returns a list of the headers

In [None]:
#############################
# Read each Column
#############################

print(df['Name'])#print out all the names of pokemon

print(df['Name'][0:5])#print out the top 5 names of pokemon

In [None]:
#############################
# Read multiple Columns
#############################

print(df['Name', 'Type 1', 'HP'])

In [None]:
#############################
# Read each Row
#############################

print(df.iloc[1]) #iloc: individual location


for index, row in df.iterrows():
    print(index, row)
    
    
for index, row in df.iterrows():
    print(index, row['Name'])

In [None]:
#############################
# Read multiple Rows
#############################

print(df.iloc[1:4])

In [None]:
################################
# Read a specific location (R,C)
################################

print(df.iloc[2,1])

In [None]:
################################
# Read data from custom filters
################################

#use loc - any attribute
print(df.loc[ df['Type 1'] == 'Fire' ])  #print out all rows that has 'Type 1' == 'Fire'

# 2. Describe Data
**df.describe() method** gives you a very quick reference/summary to the data values     
- count
- mean
- std
- 25% percentile
- 50% percentile
- 75% percentile
- max

In [None]:
df.describe()

In [None]:
# Getting specific attributes eg mean value of 'HP'
df['HP'].describe().mean()

# 3. Sorting Data

In [None]:
################################
# Sort by One Attribute
################################
df.sort_values('Name') #sorts in alphabetical order

In [None]:
df.sort_values('Name', ascending = False)

In [None]:
################################
# Sort by multiple Attributes
################################
df.sort_values(['Type 1', 'HP']) #sorts by 'Type 1' first, then sort by 'HP'

In [None]:
df.sort_values(['Type 1', 'HP'], ascending=[1,0]) # 'Type 1' ascending, 'HP' descending order

# 4. Making Changes to Data

In [None]:
################################
# Adding New Column into data
################################

df['Total'] = df['HP'] + df['Attack'] + df['Defense'] + df['Sp. Atk'] + df['Sp. Def'] + df['Speed']

df['Total'] = df.iloc[:, 4:10].sum(axis=1) #from 'HP' to 'Speed' #axis=1 horizontally, axis=0 vertically

In [None]:
################################
# Remove a specific Colum
################################

df = df.drop(columns = ['Total']) #need to update back to df