# Welcome to Pandas basics!

### What to expect after going through this notebook :
* A good understanding of Pandas and its working
* Load data from external sources into the notebook
* Make sense out of the data
* Prepare the data for the Machine Learning algorithms

### What not to expect :
* You won't become a data-importing / data-cleaning ninja just by learning this notebook

### Pre-requisites for this tutorial : 
* Basic Python, teensy bit of numpy
* A teensy bit of GoT / ASIOF knowledge will help to fully understand the task at the end of this tutorial :)

### Strong Recommendation :
* Keep looking into the documentation of this package simultaneously as you progress. There's loads of functionality under the hood that I'm not covering here.

## What is Pandas?

### Definition 
***pandas*** is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

### Explain Like I'm Five Definition - 
***pandas*** is a data handling package which [someone](https://pandas.pydata.org/) spent months developing so that you can get your job done in seconds.

### Why Pandas?
* Makes data importing easier
* Makes data organizing easier
* Makes data filtering easier
* Makes data exporting easier

Makes life as a data scientist easier, to be short.

## What are the magic words?
To import this package into IPython - 

In [None]:
import pandas as pd

Why import as pd? Because *#conventions*.

Now let's dive in...

## 1. Building DataFrames with Pandas

### 1.1 - DataFrames from Python Dictionaries

In [None]:
dog_dict = {
    'name': ['Freddie','Bruno'], 
    'age': [9, 7], 
    'is_vaccinated': [True, False], 
    'height': [1.1, 2.3],  
    'birth_year': [2001, 2003]
}

dog_dataframe = pd.DataFrame(dog_dict)
dog_dataframe

Cool! But why did the order of columns change in the dataframe?

In [None]:
dog_dataframe = pd.DataFrame(dog_dict, columns = dog_dict.keys())
dog_dataframe

### 1.2 - DataFrames from Python Lists

In [None]:
name = ['Freddie','Bruno']
age = [9, 7]
is_vaccinated = [True, False]
height = [1.1, 2.3]
birth_year = [2001, 2003]

list_labels = ['name', 'age', 'is_vaccinated', 'height', 'birth_year']
list_cols = [name, age, is_vaccinated, height, birth_year]
z = list(zip(list_labels, list_cols))
print(z)

In [None]:
dog_data = dict(z)
dog_dataframe2 = pd.DataFrame(dog_data, columns = dog_dict.keys())
dog_dataframe2

## 2. Importing data with Pandas

### 2.1 - Dataframes from CSV files

In [None]:
my_first_dataframe = pd.read_csv('got-character-deaths.csv')
my_first_dataframe

### 2.EX Try this out 
Import `got-character-deaths1.csv` file into pandas DataFrame and assign column labels to the DataFrame

In [None]:
#your-code_goes-here

## 3. Export DataFrame using Pandas

### 3.1 To CSV File

In [None]:
my_first_dataframe.to_csv('my_first_export.csv')

### 3.2 To Excel File

In [None]:
my_first_dataframe.to_excel('my_second_export.xlsx')

### 3.EX Try this out 
Export `my_first_dataframe` to a CSV file, separated by `/`

In [None]:
#your-code_goes-here

## 4. Dataframe Basics Methods

In [None]:
a = my_first_dataframe
type(a)

### .info()
data-frame.**info()** -> Displays useful information about the dataframe

In [None]:
#Try it out
a.info()

### .shape
data-frame.shape -> Shows the no. of rows and columns

In [None]:
#Try it out
a.shape

### .columns
data-frame.columns -> Diplays an Index item containing column names

In [None]:
#Try it out
a.columns

### .head() and .tail()
data-frame**.head(n)** -> Displays the first n rows of the dataframe

data-frame**.tail(n)** -> Displays the last n rows of the dataframe

If n is not mentioned - default value = 5

In [None]:
#Try them out
a.head()

### .describe()
data-frame**.describe()** -> Statistical data like mean, sd, quartiles and stuff

data-frame['column-name']**.describe()** -> Interesting data like - 
* count -> number of non-null entries
* unique -> number of distinct values
* top -> entry with highest frequency of occurence
* freq -> number of entries of top


In [None]:
a.describe()

In [None]:
a['Allegiances'].describe()

### Indexing DataFrames - .iloc[ ]
data-frame.**iloc[]** -> retrieve a particular set of rows and columns

#### What .iloc[ ] accepts as arguements ->
* integer -> 3
* list of integers -> [3,5,7,8]
* slice objects -> 0:9

Eg:

* data-frame**.iloc[4]**                    -> Gets row 4
* data-frame**.iloc[[2,3,4,5], [2,4]]**     -> Gets rows 2,3,4,5 with columns 2 and 4
* data-frame**.iloc[:5, 1:3]**              -> Gets rows 0 to 4 with columns 1 to 2

*Remember - if slice is a:b, then a is included, b is excluded*

In [None]:
a.iloc[]

### Indexing DataFrames - .loc[ ]
data-frame.**loc[]** -> retrieve a particular set of rows and columns

#### What .loc[ ] accepts as arguements ->
* integer -> 3
* list of integers and column labels -> ['Death Year', 'Death Chapter']
* slice objects -> 0:9

Eg:

* data-frame**.loc[:5, ['Death Year', 'Death Chapter']]**   -> Gets 0 to 5 rows with columns labeled Death Year and Death Chapter
* data-frame**.loc[[2,3,4,5], [2,4]]**                      -> Gets rows 2,3,4,5 with columns 2 and 4
* data-frame**.loc[:5, 1:3]**                               -> Gets rows 0 to 5 with columns 1 to 3

*Remember - if slice is a:b, then a and b both are included*

In [None]:
a.loc[]

### Indexing DataFrames - Get a column as Series Object
data-frame**[*column_name*]** -> retrieve a particular set of rows and columns

Eg:

* data-frame[['Death Year', 'Death Chapter']] -> Fetches both the mentioned columns as a Series object 

*Note - Series object can contain datas of mixed type*

## 5. Boolean Indexing and Math
* less-than-seven-filter = **data-frame['column-name'] < 7** -> Returns a Series of Boolean values of the *"column-name"* to less-than-seven-filter

To get the rows that only satisfy the condition, pass this Series as an index to DataFrame
* filtered-data-frame = **data-frame[less-than-seven-filter]**

*Note - Do not use **filter** as a DataFrame name as it is a keyword in python*

In [None]:
#Your code goes here

## 6. Visual Analysis of data

* data-frame**.plot(x = 'x-column', y = 'y-column')** -> line plot by default
* data-frame**.plot(x = 'x-column', y = 'y-column', kind = 'scatter')** -> scatter plot
        arguements for kind - 
            ‘line’ : line plot (default)
            ‘bar’ : vertical bar plot
            ‘hist’ : histogram
            ‘box’ : boxplot
            ‘pie’ : pie plot
            ‘scatter’ : scatter plot



Using the functionality of **matplotlib.pyplot** library - 

1. Labeling the axes and title - 
    * **plt.xlabel('x-label')**
    * **plt.ylabel('y-label')**
    * **plt.title('title-of-the-plot')**

2. Showing the result of the plot - 
    * **plt.show()**

In [None]:
import matplotlib.pyplot as plt #conventions

#Your code goes here

# Winter is here...

Here is a small excercise to test out your understanding...

## Three Eyed Raven's Visions (Dataset Column Description) - 

* Name - Character name
* Allegiances - Character house
* Death Year - Year character died
* Book of Death - Book character died
* Death Chapter - Book character died in
* Book Intro Chapter - Chapter character was introduced in
* Gender - 1 is male, 0 is female
* Nobility - 1 is nobel, 0 is a commoner
* GoT - Appeared in first book
* CoK - Appeared in second book
* SoS - Appeared in third book
* FfC - Appeared in fourth book
* DwD - Appeared in fifth book

## White walkers that you have to slay (Tasks) - 

1. Get me the house with the most deaths
2. Get me the chapter in which Uncle Benjamin Stark dies.
3. Find the ratio of female to male deaths with allegiance to House Stark

**Bring out your Valyrian steel swords and have a swing at these white walkers...**

In [None]:
#Your battlefield for WW1 :)


In [None]:
#Your battlefield for WW2 :)


In [None]:
#Your battlefield for WW3 :)


## And with that, the long night ends... :)

![That's all folks! See you in the next topic.](folks.jpg)