## Pandas for Data Manipulation

Pandas is an abbreviation of Panel Data. It is used for data wrangling, manipulation, analysis and visualization.

Pandas is built on the NumPy package. So it's underlying functions are array-like.

Pandas has two major ways of presenting data.

* **Pandas Series** - a one dimensional array-like datatype
* **Pandas DataFrame** - a two dimensional datatype for manipulating tabular data.

The last but less frequently used datatype in pandas is the PANEL dATA which is a framework for working with 3D data.

##bolu.oludupin@gmail.com

## Preparing your Pandas Environment

If you do not have pandas intalled, you can use

      pip install pandas

to intall pandas into your evironment.


Next, we have to import pandas into our working environment.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Pandas Series Object

This is like a NumPy 1-D array. It also has homogeneous data types.


### Creating a Pandas Series Object

### 1. Creating Series Objects from Dictionaries

In [None]:
dict1 = {'coke':200, 'malt': 250, 'nutri-milk': 300, 'zobo':100, 'fearless': 300}

#create pandas series using pd.series

drinks = pd.Series(dict1)

drinks


coke          200
malt          250
nutri-milk    300
zobo          100
fearless      300
dtype: int64

In [None]:
drinks.index

Index(['coke', 'malt', 'nutri-milk', 'zobo', 'fearless'], dtype='object')

In [None]:
drinks.values

array([200, 250, 300, 100, 300])

In [None]:
drinks['coke']

200

In [None]:
dict2 = {0: 950, 1: 1350, 2: 2500, 3: 700}

more_drinks = pd.Series(dict2, index = ['chi exotic', 'eva wine', 'vita milk', 'chivita'])

more_drinks

chi exotic   NaN
eva wine     NaN
vita milk    NaN
chivita      NaN
dtype: float64

### 2. Create a Pandas Series using lists

In [None]:
index = ['chi exotic', 'eva wine', 'vita milk', 'chivita']
list1 = [950, 1350, 2500, 700]
more_drinks = pd.Series(list1, index = index)

more_drinks

chi exotic     950
eva wine      1350
vita milk     2500
chivita        700
dtype: int64

In [None]:
xc = pd.Series(index, name = 'soft_drinks')

In [None]:
xc

0    chi exotic
1      eva wine
2     vita milk
3       chivita
Name: soft_drinks, dtype: object

### Combining Pandas Series Objects

In [None]:
print(drinks)
print(more_drinks, end='\n\n')

soft_drinks = pd.concat([drinks,more_drinks])

print(soft_drinks)

coke          200
malt          250
nutri-milk    300
zobo          100
fearless      300
dtype: int64
chi exotic     950
eva wine      1350
vita milk     2500
chivita        700
dtype: int64

coke           200
malt           250
nutri-milk     300
zobo           100
fearless       300
chi exotic     950
eva wine      1350
vita milk     2500
chivita        700
dtype: int64


### Some Indexing and Slicing



In [None]:
soft_drinks['fearless'] = 350

soft_drinks['fearless']

350

In [None]:
soft_drinks['coke': 'zobo']

coke          200
malt          250
nutri-milk    300
zobo          100
dtype: int64

In [None]:
soft_drinks[6]

1350

In [None]:
type(soft_drinks)
soft_drinks.dtype

dtype('int64')

## Pandas DataFrames

A DataFrame is a 2-D array of two or more Series objects.

### Creating DataFrames

### 1. Create DataFrames from a Dictionary of Lists

In [None]:
df_dict = {'clubs': ['Real Madrid', 'Barcelona', 'Inter Milan', 'AC Milan', 'Chelsea', 'Liverpool', 'Manchester United'],
           'UCL Titles': [14, 3, 3, 7, 2, 5, 3]}

df = pd.DataFrame(df_dict, index = np.arange(1,8))
df

Unnamed: 0,clubs,UCL Titles
1,Real Madrid,14
2,Barcelona,3
3,Inter Milan,3
4,AC Milan,7
5,Chelsea,2
6,Liverpool,5
7,Manchester United,3
