## 1. Creating Pandas Series

A pandas series is 1D array that can hold many datatypes.

In [None]:
import pandas as pd

. A Series represents a single sample (one observation).

. Each index (eggs, apples, Milk, Bread) is like a feature name.

. Each value (30, 6, 'Yes', 'No') is the feature value for this sample.

In [None]:
# Creating pandas series:
groceries = pd.Series(data=[30, 6, 'Yes', 'No'],
                      index = ['eggs', 'apples', 'milk', 'bread' ])
groceries

Unnamed: 0,0
eggs,30
apples,6
milk,Yes
bread,No


In [None]:
# Shape of pandas series:
groceries.shape

(4,)

In [None]:
# the index labels of the series object:
groceries.index

Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')

In [None]:
# the data of the series object:
groceries.values

array([30, 6, 'Yes', 'No'], dtype=object)

## 2. Accessing and Deleting Elements in Pandas Series

### 2.1 Accessing

We can access elements with their index labels

In [None]:
# access the quantity of eggs:
groceries['eggs']

30

We can get multiple elements by providing a list of index labels:

In [None]:
# access elements with list of labels:
groceries[['milk', 'bread']]

Unnamed: 0,0
milk,Yes
bread,No


Another way to access elements is numeric indices, very similar on how we acess elements in NumPy arrays:

In [None]:
# access elements using indices:
groceries[0]

  groceries[0]


30

In [None]:

# acess elements using indices:
groceries[[0, 2]]

  groceries[[0, 2]]


Unnamed: 0,0
eggs,30
milk,Yes


In order to remove any ambiguity rather we are referring to an index label or a numerical index, pandas series have 2 attributes loc and iloc.

. **loc** : Use a labelled index.

. **iloc** : Use a numerical index.

In [None]:
groceries.loc[['eggs', 'apples']]

Unnamed: 0,0
eggs,30
apples,6


In [None]:
groceries.iloc[[0, 2]]

Unnamed: 0,0
eggs,30
milk,Yes


### 2.2 Deleting

We can delete elements from a pandas series with the given label:

In [None]:
# drop a given label: dropping out of place
groceries.drop(['apples', 'eggs'])

Unnamed: 0,0
milk,Yes
bread,No


In [None]:
groceries

Unnamed: 0,0
eggs,30
apples,6
milk,Yes
bread,No


## 3. Creating Pandas DataFrames


Create a dataframe manually from a dictionnary containing several pandas series.

In [None]:
# Creata a dictionnary:
items = {'Bob': pd.Series([245, 25, 55], index=['bike', 'pants', 'watch']),
         'Alice': pd.Series([40, 110, 500, 45], index=['book', 'glasses', 'bike', 'pants'])}

In [None]:
shopping_cards = pd.DataFrame(items)
shopping_cards

Unnamed: 0,Bob,Alice
bike,245.0,500.0
book,,40.0
glasses,,110.0
pants,25.0,45.0
watch,55.0,


In [None]:
# Create the dataframe with some specific cols:
bob_shopping_card = pd.DataFrame(items, columns=['Bob'])
bob_shopping_card

Unnamed: 0,Bob
bike,245
pants,25
watch,55


In [None]:
# Create the dataframe with some specific indexes:
sel_shopping_card = pd.DataFrame(items, index=['pants', 'book'])
sel_shopping_card

Unnamed: 0,Bob,Alice
pants,25.0,45
book,,40


In [None]:
# Creating dataframes using dictionnary:
data = {'Integers': [1, 2, 3],
        'Floats': [4.5, 8.2, 9.1]}
df = pd.DataFrame(data)
df

Unnamed: 0,Integers,Floats
0,1,4.5
1,2,8.2
2,3,9.1


In [None]:
data = {'Integers': [1, 2, 3],
        'Floats': [4.5, 8.2, 9.1]}
df = pd.DataFrame(data, index=['label 1', 'label 2', 'label 3'])
df

Unnamed: 0,Integers,Floats
label 1,1,4.5
label 2,2,8.2
label 3,3,9.1


In [None]:
# data shape:
shopping_cards.shape

(5, 2)

In [None]:
# data size:
shopping_cards.size

10

## 4. Manipulating DataFrames


In [None]:
# Create a dataframe:
items = [{"bikes": 20, 'pants': 30, 'watches': 35},
         {'watches': 10, 'glasses': 50, "bikes": 15, 'pants': 5} ]
store_items = pd.DataFrame(items, index=['store 1', 'store 2'])
store_items

Unnamed: 0,bikes,pants,watches,glasses
store 1,20,30,35,
store 2,15,5,10,50.0


### 4.1 Accessing

In [None]:
# access the bike col:
store_items[['bikes']]

Unnamed: 0,bikes
store 1,20
store 2,15


In [None]:
# access miltiple cols:
store_items[['bikes', 'pants']]

Unnamed: 0,bikes,pants
store 1,20,30
store 2,15,5


In [None]:
# access the row store 1:
store_items.loc[['store 1']]

Unnamed: 0,bikes,pants,watches,glasses
store 1,20,30,35,


In [None]:
# acess specific col and row:
store_items['bikes']['store 2']

np.int64(15)

### 4.2 Adding

In [None]:
# Adding a col:
store_items['shirts'] = [15, 12]
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts
store 1,20,30,35,,15
store 2,15,5,10,50.0,12


In [None]:
# adding a combination of cols:
store_items['suits'] = store_items['shirts'] + store_items['pants']
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store 1,20,30,35,,15,45
store 2,15,5,10,50.0,12,17


In [None]:
# Adding a row:
new_items = [{"bikes": 20, 'pants': 30, 'watches': 35, 'glasses': 4}]
new_store = pd.DataFrame(new_items, index=['store 3'])
store_items = pd.concat([store_items, new_store])
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store 1,20,30,35,,15.0,45.0
store 2,15,5,10,50.0,12.0,17.0
store 3,20,30,35,4.0,,


### 4.3 Deleting

In [None]:
# delete a col:
store_items = store_items.drop(['watches', 'suits'], axis=1)
store_items

Unnamed: 0,bikes,pants,glasses,shirts
store 1,20,30,,15.0
store 2,15,5,50.0,12.0
store 3,20,30,4.0,


In [None]:
# delete a col:
store_items = store_items.drop(['store 1', 'store 3'], axis=0)
store_items

Unnamed: 0,bikes,pants,glasses,shirts
store 2,15,5,50.0,12.0


### 4.4 Renaming

In [None]:
# rename a col:
store_items = store_items.rename(columns={'bikes': 'hats', 'pants': 'jeans'})
store_items

Unnamed: 0,hats,jeans,glasses,shirts
store 2,15,5,50.0,12.0


In [None]:
# transform a feature to an index labels:
store_items = store_items.set_index('jeans')
store_items

Unnamed: 0_level_0,hats,glasses,shirts
jeans,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5,15,50.0,12.0


## 5. Dealing with NaN

In [None]:
# creating a df:
items = [{'bikes': 20, 'pants': 30, 'watches': 35, 'shirts': 15, 'shoes':8, 'suits':45},
  {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5, 'shirts': 2, 'shoes':5, 'suits':7},
  {'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4, 'shoes':10}]
store_items = pd.DataFrame(items, index=['store 1', 'store 2', 'store 3'])
store_items

Unnamed: 0,bikes,pants,watches,shirts,shoes,suits,glasses
store 1,20,30,35,15.0,8,45.0,
store 2,15,5,10,2.0,5,7.0,50.0
store 3,20,30,35,,10,,4.0


### 5.1 Count the number of NaN values

In [None]:
# count the non NAN values:
store_items.count()

Unnamed: 0,0
bikes,3
pants,3
watches,3
shirts,2
shoes,3
suits,2
glasses,2


In [None]:
# count the number of nan values:
x = store_items.isnull().sum().sum()
print(x)

3


### 5.2 Removing NaN values

In [None]:
# we use dropna and use axis = 0 to remove any row has Nan values: we use inplace
store_items.dropna(axis=0)

Unnamed: 0,bikes,pants,watches,shirts,shoes,suits,glasses
store 2,15,5,10,2.0,5,7.0,50.0


### 5.3 Replacing NaN values

In [None]:
# Replace them with 0 :
store_items.fillna(0)

Unnamed: 0,bikes,pants,watches,shirts,shoes,suits,glasses
store 1,20,30,35,15.0,8,45.0,0.0
store 2,15,5,10,2.0,5,7.0,50.0
store 3,20,30,35,0.0,10,0.0,4.0


In [None]:
# reaplace them with the previous value of a sample:
store_items.fillna(method='ffill', axis=0)

  store_items.fillna(method='ffill', axis=0)


Unnamed: 0,bikes,pants,watches,shirts,shoes,suits,glasses
store 1,20,30,35,15.0,8,45.0,
store 2,15,5,10,2.0,5,7.0,50.0
store 3,20,30,35,2.0,10,7.0,4.0
