# Pandas Python Library

In [1]:
import pandas as pd

Pandas is based on two core fundamentals **DataFrames and Series**.

## Dataframe

A **DataFrame** it's basically a matrix of data elements. Each entry corresponds to a row and a column.

In [9]:
# We may use the pd.DataFrame() constructor in order to create DataFrame objects i.e.
pd.DataFrame({'Bob':['I liked it', 'It was awful.'], 'Sue':['Pretty good.', 'Product B']}, index=['Product A', 'Product B'])
# The Dictionary-list pair is used in order to assign a column/row pair, and the index flag is used to substitute
# the auto generated 0,1,2,.. index with a known one.

Unnamed: 0,Bob,Sue
Product A,I liked it,Pretty good.
Product B,It was awful.,Product B


## Series

A **series** is basically a list, as opposed of a dataframe being a matrix.

In [19]:
#i.e.
pd.Series([30, 45, 40, 35], index=['2015 Data', '2016 Data', '2017 Data', '2018 Data'], name = 'Product A Records')
# The index and name flags may also be used in the same way in Series

2015 Data    30
2016 Data    45
2017 Data    40
2018 Data    35
Name: Product A Records, dtype: int64

In order to **read already existing Dataframes** the .read_csv() pandas method must be used. 

In [20]:
main_data = pd.read_csv('Categorical.csv')

In [21]:
# The shape() method may be used to check the dataframe size.
data.shape

(241, 4)

In [22]:
main_data.head()
# Head is extremely used in order to take a first look at the data, a number may also be passed into the function to print more rows

Unnamed: 0,name,Longitude,Latitude,continent
0,Aruba,-69.982677,12.52088,North America
1,Afghanistan,66.004734,33.835231,Asia
2,Angola,17.537368,-12.293361,Africa
3,Anguilla,-63.064989,18.223959,North America
4,Albania,20.049834,41.14245,Europe


In [24]:
# To write a DataFrame into the disk the to_csv() method is utilized.
# i.e. main_data.to_csv("testing_write.csv") Saving main_data dataframe as testing write in the current directory

# Inserting, Selecting and Assigning

In [25]:
main_data

Unnamed: 0,name,Longitude,Latitude,continent
0,Aruba,-69.982677,12.520880,North America
1,Afghanistan,66.004734,33.835231,Asia
2,Angola,17.537368,-12.293361,Africa
3,Anguilla,-63.064989,18.223959,North America
4,Albania,20.049834,41.142450,Europe
...,...,...,...,...
236,Samoa,-172.164851,-13.753243,Oceania
237,Yemen,47.586762,15.909280,Asia
238,South Africa,25.083901,-29.000341,Africa
239,Zambia,27.774759,-13.458242,Africa


In [26]:
# To access an attribute a simple selection can me made.
main_data.name

0             Aruba
1       Afghanistan
2            Angola
3          Anguilla
4           Albania
           ...     
236           Samoa
237           Yemen
238    South Africa
239          Zambia
240        Zimbabwe
Name: name, Length: 241, dtype: object

In [27]:
# The same thing can be done with the Dictionary-Key built-in Python Function
main_data['name']

0             Aruba
1       Afghanistan
2            Angola
3          Anguilla
4           Albania
           ...     
236           Samoa
237           Yemen
238    South Africa
239          Zambia
240        Zimbabwe
Name: name, Length: 241, dtype: object

In [33]:
main_data['name'][0]

'Aruba'

## Pandas Indexing

## Index-Based Selection

In [34]:
# Pandas has it's own index selection operators they're called loc and iloc. 
main_data.iloc[0] # To select the first row

name                 Aruba
Longitude       -69.982677
Latitude          12.52088
continent    North America
Name: 0, dtype: object

In [35]:
main_data.iloc[:,0] # To retrieve the full column

0             Aruba
1       Afghanistan
2            Angola
3          Anguilla
4           Albania
           ...     
236           Samoa
237           Yemen
238    South Africa
239          Zambia
240        Zimbabwe
Name: name, Length: 241, dtype: object

## Label-based Selection

In [36]:
main_data.loc[0,'name']

'Aruba'