# Pandas v0.22 
# Content:
1. [Creating Pandas Series](#1.-Creating-Pandas-Series)
2. [Accessing and Deleting Elements in Pandas Series](#2.-Accessing-and-Deleting-Elements-in-Pandas-Series)
3. [Changing in a value](#3.-Changing-in-a-value)
4. [Deleting Elements in a new series copy](#4.-Deleting-Elements-in-a-new-series-copy)
5. [Deleting Elements in the original series](#5.-Deleting-Elements-in-the-original-series)
6. [Arithmetic Operations on Pandas Series](#6.-Arithmetic-Operations-on-Pandas-Series)
7. [Add / Sub / Mul / Div operations](#7.-Add-/-Sub-/-Mul-/-Div-operations)
8. [Apply mathematical functions from NumPy](#8.-Apply-mathematical-functions-from-NumPy)
9. [Apply arithmetic operations on selected items](#9.-Apply-arithmetic-operations-on-selected-items)
10. [Arithmetic operations on Pandas Series of mixed data](#10.-Arithmetic-operations-on-Pandas-Series-of-mixed-data)

### Creating Pandas DataFrames

11. [Create a DataFrame from a dictionary of pandas series](#11.-Create-a-DataFrame-from-a-dictionary-of-pandas-series)
12. [Create a dictionary of Pandas Series without indexes](#12.-Create-a-dictionary-of-Pandas-Series-without-indexes)
13. [Some information from our shopping_carts DataFrame](#13.-Some-information-from-our-shopping_carts-DataFrame)
14. [Select which data we want to put into our DataFrame](#14.-Select-which-data-we-want-to-put-into-our-DataFrame)
15. [Create DataFrames from a dictionary of lists](#15.-Create-DataFrames-from-a-dictionary-of-lists)
16. [Put labels to the row index by using the index for last example](#16.-Put-labels-to-the-row-index-by-using-the-index-for-last-example)
17. [Creating Pandas DataFrames with list of Python dictionaries](#17.-Creating-Pandas-DataFrames-with-list-of-Python-dictionaries)
18. [Add some labels for the last example using index](#18.-Add-some-labels-for-the-last-example-using-index)
19. [Accessing Elements in Pandas DataFrames](#19.-Accessing-Elements-in-Pandas-DataFrames)
20. [Access elements in Pandas DataFrames in many different ways](#20.-Access-elements-in-Pandas-DataFrames-in-many-different-ways)
21. [Modify our DataFrames by adding rows or columns](#21.-Modify-our-DataFrames-by-adding-rows-or-columns)
22. [Add new columns by using arithmetic operations between other columns in our DataFrame](#22.-Add-new-columns-by-using-arithmetic-operations-between-other-columns-in-our-DataFrame)
23. [How to add new rows to our DataFrame?](#23.-How-to-add-new-rows-to-our-DataFrame?)
24. [Add new columns by using only data from particular rows](#24.-Add-new-columns-by-using-only-data-from-particular-rows)
25. [Insert new columns into the DataFrames anywhere we want](#25.-Insert-new-columns-into-the-DataFrames-anywhere-we-want)
26. [Delete columns from our DataFrame using .pop()](#26.-Delete-columns-from-our-DataFrame-using-.pop())
27. [Delete columns or rows from our DataFrame using .drop()](#27.-Delete-columns-or-rows-from-our-DataFrame-using-.drop())
28. [Rename row or column labels](#28.-Rename-row-or-column-labels)
29. [Change the index to be one of the columns in the DataFrame](#29.-Change-the-index-to-be-one-of-the-columns-in-the-DataFrame)
30. [Dealing with NaN](#30.-Dealing-with-NaN)
31. [How to count the number of NaN values in store_items?](#31.-How-to-count-the-number-of-NaN-values-in-store_items?)
32. [Number of non-NaN values in our DataFrame](#32.-Number-of-non-NaN-values-in-our-DataFrame)
33. [How to drop any rows or columns with NaN values?](#33.-How-to-drop-any-rows-or-columns-with-NaN-values?)
34. [Replace all NaN values with 0](#34.-Replace-all-NaN-values-with-0)
35. [Using forward filling to replace NaN values](#35.-Using-forward-filling-to-replace-NaN-values)
36. [Using backward filling to replace NaN values](#36.-Using-backward-filling-to-replace-NaN-values)
37. [Using linear interpolation to replace NaN](#37.-Using-linear-interpolation-to-replace-NaN)
38. [Loading Data into a Pandas DataFrame](#38.-Loading-Data-into-a-Pandas-DataFrame)
39. [Get descriptive statistics on each column](#39.-Get-descriptive-statistics-on-each-column)
40. [Apply the .describe() method on a single column](#40.-Apply-the-.describe()-method-on-a-single-column)
41. [Get min, max and mean for each column](#41.-Get-min,-max-and-mean-for-each-column)
42. [How to measure the data correlation?](#42.-How-to-measure-the-data-correlation?)
***

# [1. Creating Pandas Series](#Content:)

In [1]:
import numpy as np
import pandas as pd

In [2]:
# pd.Series(data, index)

# We create a Pandas Series that stores a grocery list
groceries = pd.Series(data = [30, 6, 'Yes', 'No'], index = ['eggs', 'apples', 'milk', 'bread'])

# We display the Groceries Pandas Series
groceries

eggs       30
apples      6
milk      Yes
bread      No
dtype: object

In [3]:
# We print some information about Groceries
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')

Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements


In [4]:
# We print the index and data of Groceries
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)

The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')


In [5]:
'eggs' in groceries.index

True

In [6]:
'eggs' in groceries

True

In [7]:
'Yes' in groceries.values

True

In [8]:
# We check whether bananas is a food item (an index) in Groceries
x = 'bananas' in groceries

# We check whether bread is a food item (an index) in Groceries
y = 'bread' in groceries

# We print the results
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)

Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True


# [2. Accessing and Deleting Elements in Pandas Series](#Content:)

In [9]:
# We access elements in Groceries using index labels:
groceries = pd.Series(data = [30, 6, 'Yes', 'No'], index = ['eggs', 'apples', 'milk', 'bread'])
print(groceries)
# We use a single index label
print('How many eggs do we need to buy:', groceries['eggs'])
print('------------------------------------------------------------')

# we can access multiple index labels
print('Do we need milk and bread:\n', groceries[['milk' , 'bread']]) 
print('------------------------------------------------------------')

# we use loc to access multiple index labels
print('How many eggs and apples do we need to buy:\n', groceries.loc[['eggs','apples']]) 
print('------------------------------------------------------------')

# We access elements in Groceries using numerical indices:

# we use multiple numerical indices
print('How many eggs and apples do we need to buy:\n',  groceries[[0,1]]) 
print('------------------------------------------------------------')

# We use a negative numerical index
print('Do we need bread:\n', groceries[[-1]]) 
print('------------------------------------------------------------')

# We use a single numerical index
print('How many eggs do we need to buy:', groceries[0]) 
print('------------------------------------------------------------')
# we use iloc to access multiple numerical indices
print('Do we need milk and bread:\n', groceries.iloc[[2, 3]]) 

eggs       30
apples      6
milk      Yes
bread      No
dtype: object
How many eggs do we need to buy: 30
------------------------------------------------------------
Do we need milk and bread:
 milk     Yes
bread     No
dtype: object
------------------------------------------------------------
How many eggs and apples do we need to buy:
 eggs      30
apples     6
dtype: object
------------------------------------------------------------
How many eggs and apples do we need to buy:
 eggs      30
apples     6
dtype: object
------------------------------------------------------------
Do we need bread:
 bread    No
dtype: object
------------------------------------------------------------
How many eggs do we need to buy: 30
------------------------------------------------------------
Do we need milk and bread:
 milk     Yes
bread     No
dtype: object


# [3. Changing in a value](#Content:)

In [10]:
# We display the original grocery list
print('Original Grocery List:\n', groceries)
print()

# We change the number of eggs to 2
groceries['eggs'] = 2

# We display the changed grocery list
print('Modified Grocery List:\n', groceries)

Original Grocery List:
 eggs       30
apples      6
milk      Yes
bread      No
dtype: object

Modified Grocery List:
 eggs        2
apples      6
milk      Yes
bread      No
dtype: object


# [4. Deleting Elements in a new series copy](#Content:)

In [11]:
# Series.drop(label)

# We display the original grocery list
print('Original Grocery List:\n', groceries)
print()

# We remove apples from our grocery list. The drop function removes elements out of place
print('We remove apples (out of place):\n', groceries.drop('apples'))
print()

# When we remove elements out of place the original Series remains intact. To see this
# we display our grocery list again
print('Grocery List after removing apples out of place:\n', groceries)

Original Grocery List:
 eggs        2
apples      6
milk      Yes
bread      No
dtype: object

We remove apples (out of place):
 eggs       2
milk     Yes
bread     No
dtype: object

Grocery List after removing apples out of place:
 eggs        2
apples      6
milk      Yes
bread      No
dtype: object


# [5. Deleting Elements in the original series](#Content:)

In [12]:
# We display the original grocery list
print('Original Grocery List:\n', groceries)
print()

# We remove apples from our grocery list in place by setting the inplace keyword to True
groceries.drop('apples', inplace = True)

# When we remove elements in place the original Series its modified. To see this
# we display our grocery list again
print('Grocery List after removing apples in place:\n', groceries)

Original Grocery List:
 eggs        2
apples      6
milk      Yes
bread      No
dtype: object

Grocery List after removing apples in place:
 eggs       2
milk     Yes
bread     No
dtype: object


# [6. Arithmetic Operations on Pandas Series](#Content:)

In [13]:
# We create a Pandas Series that stores a grocery list of just fruits
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])

# We display the fruits Pandas Series
fruits

apples     10
oranges     6
bananas     3
dtype: int64

# [7. Add / Sub / Mul / Div operations](#Content:)

In [14]:
# We print fruits for reference
print('Original grocery list of fruits:\n ', fruits)
print()

# We perform basic element-wise operations using arithmetic symbols
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits
print()
print('fruits - 2:\n', fruits - 2) # We subtract 2 to each item in fruits
print()
print('fruits * 2:\n', fruits * 2) # We multiply each item in fruits by 2 
print()
print('fruits / 2:\n', fruits / 2) # We divide each item in fruits by 2
print()

Original grocery list of fruits:
  apples     10
oranges     6
bananas     3
dtype: int64

fruits + 2:
 apples     12
oranges     8
bananas     5
dtype: int64

fruits - 2:
 apples     8
oranges    4
bananas    1
dtype: int64

fruits * 2:
 apples     20
oranges    12
bananas     6
dtype: int64

fruits / 2:
 apples     5.0
oranges    3.0
bananas    1.5
dtype: float64



# [8. Apply mathematical functions from NumPy](#Content:)

In [15]:
# We print fruits for reference
print('Original grocery list of fruits:\n', fruits)
print()

# We apply different mathematical functions to all elements of fruits
print('EXP(X) = \n', np.exp(fruits))
print() 
print('SQRT(X) =\n', np.sqrt(fruits))
print()
print('POW(X,2) =\n',np.power(fruits,2)) # We raise all elements of fruits to the power of 2

Original grocery list of fruits:
 apples     10
oranges     6
bananas     3
dtype: int64

EXP(X) = 
 apples     22026.465795
oranges      403.428793
bananas       20.085537
dtype: float64

SQRT(X) =
 apples     3.162278
oranges    2.449490
bananas    1.732051
dtype: float64

POW(X,2) =
 apples     100
oranges     36
bananas      9
dtype: int64


# [9. Apply arithmetic operations on selected items](#Content:)

In [16]:
# We print fruits for reference
print('Original grocery list of fruits:\n ', fruits)
print()

# We add 2 only to the bananas
print('Amount of bananas + 2 = ', fruits['bananas'] + 2)
print()

# We subtract 2 from apples
print('Amount of apples - 2 = ', fruits.iloc[0] - 2)
print()

# We multiply apples and oranges by 2
print('We double the amount of apples and oranges:\n', fruits[['apples', 'oranges']] * 2)
print()

# We divide apples and oranges by 2
print('We half the amount of apples and oranges:\n', fruits.loc[['apples', 'oranges']] / 2)

Original grocery list of fruits:
  apples     10
oranges     6
bananas     3
dtype: int64

Amount of bananas + 2 =  5

Amount of apples - 2 =  8

We double the amount of apples and oranges:
 apples     20
oranges    12
dtype: int64

We half the amount of apples and oranges:
 apples     5.0
oranges    3.0
dtype: float64


# [10. Arithmetic operations on Pandas Series of mixed data](#Content:)

In [17]:
groceries = pd.Series(data = [30, 6, 'Yes', 'No'], index = ['eggs', 'apples', 'milk', 'bread'])
# We multiply our grocery list by 2
groceries * 2

eggs          60
apples        12
milk      YesYes
bread       NoNo
dtype: object

In [18]:
# Run(groceries / 2) ---> Error!

In [19]:
# Create a Pandas Series that contains the distance of some planets from the Sun.
# Use the name of the planets as the index to your Pandas Series, and the distance
# from the Sun as your data. The distance from the Sun is in units of 10^6 km

distance_from_sun = [149.6, 1433.5, 227.9, 108.2, 778.6]

planets = ['Earth','Saturn', 'Mars','Venus', 'Jupiter']

# Create a Pandas Series using the above data, with the name of the planets as
# the index and the distance from the Sun as your data.
dist_planets = pd.Series(data = distance_from_sun, index = planets)
print("dist planets =\n",dist_planets)
print()
# Calculate the number of minutes it takes sunlight to reach each planet. You can
# do this by dividing the distance from the Sun for each planet by the speed of light.
# Since in the data above the distance from the Sun is in units of 10^6 km, you can
# use a value for the speed of light of c = 18, since light travels 18 x 10^6 km/minute.
time_light = dist_planets/18
print("time_light=\n",time_light)
print()
# Use Boolean indexing to select only those planets for which sunlight takes less
# than 40 minutes to reach them.
close_planets = time_light[time_light < 40]
print("close_planets=\n",close_planets)

dist planets =
 Earth       149.6
Saturn     1433.5
Mars        227.9
Venus       108.2
Jupiter     778.6
dtype: float64

time_light=
 Earth       8.311111
Saturn     79.638889
Mars       12.661111
Venus       6.011111
Jupiter    43.255556
dtype: float64

close_planets=
 Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64


# Creating Pandas DataFrames

# [11. Create a DataFrame from a dictionary of pandas series](#Content:)

In [20]:
# We create a dictionary of Pandas Series 
items = {'Bob' : pd.Series(data = [245, 25, 55], index = ['bike', 'pants', 'watch']),
         'Alice' : pd.Series(data = [40, 110, 500, 45], index = ['book', 'glasses', 'bike', 'pants'])}

# We print the type of items to see that it is a dictionary
print(type(items))

<class 'dict'>


In [21]:
# We create a Pandas DataFrame by passing it a dictionary of Pandas Series
shopping_carts = pd.DataFrame(items)

# We display the DataFrame
shopping_carts

Unnamed: 0,Bob,Alice
bike,245.0,500.0
book,,40.0
glasses,,110.0
pants,25.0,45.0
watch,55.0,


# [12. Create a dictionary of Pandas Series without indexes](#Content:)

In [22]:
# We create a dictionary of Pandas Series without indexes
data = {'Bob' : pd.Series([245, 25, 55]),
        'Alice' : pd.Series([40, 110, 500, 45])}

# We create a DataFrame
df = pd.DataFrame(data)

# We display the DataFrame
df

Unnamed: 0,Bob,Alice
0,245.0,40
1,25.0,110
2,55.0,500
3,,45


# [13. Some information from our shopping_carts DataFrame](#Content:)

In [23]:
# We print some information about shopping_carts
print('shopping_carts has shape:', shopping_carts.shape)
print('shopping_carts has dimension:', shopping_carts.ndim)
print('shopping_carts has a total of:', shopping_carts.size, 'elements')
print()
print('The data in shopping_carts is:\n',shopping_carts.values)
print()
print('The row index in shopping_carts is:', shopping_carts.index)
print()
print('The column index in shopping_carts is:', shopping_carts.columns)

shopping_carts has shape: (5, 2)
shopping_carts has dimension: 2
shopping_carts has a total of: 10 elements

The data in shopping_carts is:
 [[245. 500.]
 [ nan  40.]
 [ nan 110.]
 [ 25.  45.]
 [ 55.  nan]]

The row index in shopping_carts is: Index(['bike', 'book', 'glasses', 'pants', 'watch'], dtype='object')

The column index in shopping_carts is: Index(['Bob', 'Alice'], dtype='object')


# [14. Select which data we want to put into our DataFrame](#Content:)

In [24]:
# We Create a DataFrame that only has Bob's data
bob_shopping_cart = pd.DataFrame(items, columns=['Bob'])

# We display bob_shopping_cart
bob_shopping_cart

Unnamed: 0,Bob
bike,245
pants,25
watch,55


In [25]:
# We Create a DataFrame that only has selected items for both Alice and Bob
sel_shopping_cart = pd.DataFrame(items, index = ['pants', 'book'])

# We display sel_shopping_cart
sel_shopping_cart

Unnamed: 0,Bob,Alice
pants,25.0,45
book,,40


In [26]:
# We Create a DataFrame that only has selected items for Alice
alice_sel_shopping_cart = pd.DataFrame(items, index = ['glasses', 'bike'], columns = ['Alice'])

# We display alice_sel_shopping_cart
alice_sel_shopping_cart

Unnamed: 0,Alice
glasses,110
bike,500


# [15. Create DataFrames from a dictionary of lists](#Content:)

In [27]:
# We create a dictionary of lists (arrays)
data = {'Integers' : [1,2,3],
        'Floats' : [4.5, 8.2, 9.6]}

# We create a DataFrame 
df = pd.DataFrame(data)

# We display the DataFrame
df

Unnamed: 0,Integers,Floats
0,1,4.5
1,2,8.2
2,3,9.6


# [16. Put labels to the row index by using the index for last example](#Content:)

In [28]:
# We create a dictionary of lists (arrays)
data = {'Integers' : [1,2,3],
        'Floats' : [4.5, 8.2, 9.6]}

# We create a DataFrame and provide the row index
df = pd.DataFrame(data, index = ['label 1', 'label 2', 'label 3'])

# We display the DataFrame
df

Unnamed: 0,Integers,Floats
label 1,1,4.5
label 2,2,8.2
label 3,3,9.6


# [17. Creating Pandas DataFrames with list of Python dictionaries](#Content:)

In [29]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame 
store_items = pd.DataFrame(items2)

# We display the DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses
0,20,30,35,
1,15,5,10,50.0


# [18. Add some labels for the last example using index](#Content:)

In [30]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame  and provide the row index
store_items = pd.DataFrame(items2, index = ['store 1', 'store 2'])

# We display the DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses
store 1,20,30,35,
store 2,15,5,10,50.0


# [19. Accessing Elements in Pandas DataFrames](#Content:)

In [31]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame  and provide the row index
store_items = pd.DataFrame(items2, index = ['store 1', 'store 2'])

# We display the DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses
store 1,20,30,35,
store 2,15,5,10,50.0


# [20. Access elements in Pandas DataFrames in many different ways](#Content:)

In [32]:
# important : dataframe[column][row]
# We print the store_items DataFrame
print(store_items)
print()

# We access rows, columns and elements using labels
print('How many bikes are in each store:\n', store_items[['bikes']])
print()
print('How many bikes and pants are in each store:\n', store_items[['bikes', 'pants']])
print()
print('What items are in Store 1:\n', store_items.loc['store 1'])
print()
print('How many bikes are in Store 2:', store_items['bikes']['store 2'])
print()
print('How many bikes are in Store 2:', store_items.loc['store 2','bikes'])
print()
print('How many bikes and glasses are in store 1 and Store 2:\n',store_items.loc[: ,'bikes':'glasses'])
print()
print('goasses to waches for store1 and store2:\n',store_items.loc['store 1':'store 2','bikes':'watches'])
print()
print('goasses to waches for store1 and store2:\n',store_items.loc['store 1':'store 2']['bikes':'watches'])

         bikes  pants  watches  glasses
store 1     20     30       35      NaN
store 2     15      5       10     50.0

How many bikes are in each store:
          bikes
store 1     20
store 2     15

How many bikes and pants are in each store:
          bikes  pants
store 1     20     30
store 2     15      5

What items are in Store 1:
 bikes      20.0
pants      30.0
watches    35.0
glasses     NaN
Name: store 1, dtype: float64

How many bikes are in Store 2: 15

How many bikes are in Store 2: 15

How many bikes and glasses are in store 1 and Store 2:
          bikes  pants  watches  glasses
store 1     20     30       35      NaN
store 2     15      5       10     50.0

goasses to waches for store1 and store2:
          bikes  pants  watches
store 1     20     30       35
store 2     15      5       10

goasses to waches for store1 and store2:
          bikes  pants  watches  glasses
store 1     20     30       35      NaN
store 2     15      5       10     50.0


# [21. Modify our DataFrames by adding rows or columns](#Content:)

In [33]:
# We add a new column named shirts to our store_items DataFrame indicating the number of
# shirts in stock at each store. We will put 15 shirts in store 1 and 2 shirts in store 2
store_items['shirts'] = [15,2]

# We display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts
store 1,20,30,35,,15
store 2,15,5,10,50.0,2


# [22. Add new columns by using arithmetic operations between other columns in our DataFrame](#Content:)

In [34]:
# We make a new column called suits by adding the number of shirts and pants
store_items['suits'] = store_items['pants'] + store_items['shirts']

# We display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store 1,20,30,35,,15,45
store 2,15,5,10,50.0,2,7


# [23. How to add new rows to our DataFrame?](#Content:)
### First : create new DatatFrame for this row

In [35]:
# We create a dictionary from a list of Python dictionaries that will number of items at the new store
new_items = [{'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4}]

# We create new DataFrame with the new_items and provide and index labeled store 3
new_store = pd.DataFrame(new_items, index = ['store 3'])

# We display the items at the new store
new_store

Unnamed: 0,bikes,pants,watches,glasses
store 3,20,30,35,4


### Second: add this row to our store_items DataFrame by using the .append() method

In [36]:
# We append store 3 to our store_items DataFrame
store_items = store_items.append(new_store)

# We display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits
store 1,20,30,35,,15.0,45.0
store 2,15,5,10,50.0,2.0,7.0
store 3,20,30,35,4.0,,


# [24. Add new columns by using only data from particular rows](#Content:)

In [37]:
# We add a new column using data from particular rows in the watches column
store_items['new watches'] = store_items['watches']['store 2': ]

# We display the modified DataFrame
store_items 

Unnamed: 0,bikes,pants,watches,glasses,shirts,suits,new watches
store 1,20,30,35,,15.0,45.0,
store 2,15,5,10,50.0,2.0,7.0,10.0
store 3,20,30,35,4.0,,,35.0


# [25. Insert new columns into the DataFrames anywhere we want](#Content:)

In [38]:
# dataframe.insert(loc,label,data)

# We insert a new column with label shoes right before the column with numerical index 4
store_items.insert(4, 'shoes', [8,5,9])

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shoes,shirts,suits,new watches
store 1,20,30,35,,8,15.0,45.0,
store 2,15,5,10,50.0,5,2.0,7.0,10.0
store 3,20,30,35,4.0,9,,,35.0


# [26. Delete columns from our DataFrame using .pop()](#Content:)

In [39]:
# We remove the new watches column
store_items.pop('new watches')

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses,shoes,shirts,suits
store 1,20,30,35,,8,15.0,45.0
store 2,15,5,10,50.0,5,2.0,7.0
store 3,20,30,35,4.0,9,,


# [27. Delete columns or rows from our DataFrame using .drop()](#Content:)

In [40]:
# We remove the watches and shoes columns
store_items = store_items.drop([ 'watches','shoes'], axis = 1)

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,glasses,shirts,suits
store 1,20,30,,15.0,45.0
store 2,15,5,50.0,2.0,7.0
store 3,20,30,4.0,,


In [41]:
# We remove the store 2 and store 1 rows
store_items = store_items.drop(['store 2', 'store 1'], axis = 0)

# we display the modified DataFrame
store_items

Unnamed: 0,bikes,pants,glasses,shirts,suits
store 3,20,30,4.0,,


# [28. Rename row or column labels](#Content:)

In [42]:
# We change the column label bikes to hats
store_items = store_items.rename(columns = {'bikes': 'hats'})

# we display the modified DataFrame
store_items

Unnamed: 0,hats,pants,glasses,shirts,suits
store 3,20,30,4.0,,


In [43]:
# We change the row label from store 3 to last store
store_items = store_items.rename(index = {'store 3': 'last store'})

# we display the modified DataFrame
store_items

Unnamed: 0,hats,pants,glasses,shirts,suits
last store,20,30,4.0,,


# [29. Change the index to be one of the columns in the DataFrame](#Content:)

In [44]:
# We change the row index to be the data in the pants column
store_items = store_items.set_index('pants')

# we display the modified DataFrame
store_items

Unnamed: 0_level_0,hats,glasses,shirts,suits
pants,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
30,20,4.0,,


# [30. Dealing with NaN](#Content:)

In [45]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'glasses': 50,'watches': 35, 'shirts': 15, 'shoes':8, 'suits':45},
{'watches': 10,  'bikes': 15, 'pants':5, 'shirts': 2, 'shoes':5, 'suits':7},
{'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4, 'shoes':10}]

# We create a DataFrame  and provide the row index
store_items = pd.DataFrame(items2, index = ['store 1', 'store 2', 'store 3'])

# We display the DataFrame
store_items

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20,30,50.0,35,15.0,8,45.0
store 2,15,5,,10,2.0,5,7.0
store 3,20,30,4.0,35,,10,


# [31. How to count the number of NaN values in store_items?](#Content:)

In [46]:
# We count the number of NaN values in store_items
x =  store_items.isnull().sum().sum()

# We print x
print('Number of NaN values in our DataFrame:\n', x)

Number of NaN values in our DataFrame:
 3


# [32. Number of non-NaN values in our DataFrame](#Content:)

In [47]:
# We print the number of non-NaN values in our DataFrame
print('Number of non-NaN values in the columns of our DataFrame:\n', store_items.count().sum())

Number of non-NaN values in the columns of our DataFrame:
 18


# [33. How to drop any rows or columns with NaN values?](#Content:)

In [48]:
# We drop any rows with NaN values
store_items.dropna(axis = 0)

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20,30,50.0,35,15.0,8,45.0


In [49]:
# We drop any columns with NaN values
store_items.dropna(axis = 1)

Unnamed: 0,bikes,pants,watches,shoes
store 1,20,30,35,8
store 2,15,5,10,5
store 3,20,30,35,10


# [34. Replace all NaN values with 0](#Content:)

In [50]:
# We replace all NaN values with 0
store_items.fillna(0)

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20,30,50.0,35,15.0,8,45.0
store 2,15,5,0.0,10,2.0,5,7.0
store 3,20,30,4.0,35,0.0,10,0.0


# [35. Using forward filling to replace NaN values](#Content:)

In [51]:
# DataFrame.fillna(method = 'ffill', axis)

# We replace NaN values with the previous value in the column
store_items.fillna(method = 'ffill', axis = 0)

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20,30,50.0,35,15.0,8,45.0
store 2,15,5,50.0,10,2.0,5,7.0
store 3,20,30,4.0,35,2.0,10,7.0


In [52]:
# We replace NaN values with the previous value in the row
store_items.fillna(method = 'ffill', axis = 1)

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20.0,30.0,50.0,35.0,15.0,8.0,45.0
store 2,15.0,5.0,5.0,10.0,2.0,5.0,7.0
store 3,20.0,30.0,4.0,35.0,35.0,10.0,10.0


# [36. Using backward filling to replace NaN values](#Content:)

In [53]:
# We replace NaN values with the next value in the column
store_items.fillna(method = 'backfill', axis = 0)

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20,30,50.0,35,15.0,8,45.0
store 2,15,5,4.0,10,2.0,5,7.0
store 3,20,30,4.0,35,,10,


In [54]:
# We replace NaN values with the next value in the row
store_items.fillna(method = 'backfill', axis = 1)

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20.0,30.0,50.0,35.0,15.0,8.0,45.0
store 2,15.0,5.0,10.0,10.0,2.0,5.0,7.0
store 3,20.0,30.0,4.0,35.0,10.0,10.0,


# [37. Using linear interpolation to replace NaN](#Content:)

In [55]:
# We replace NaN values by using linear interpolation using column values
store_items.interpolate(method = 'linear', axis = 0)

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20,30,50.0,35,15.0,8,45.0
store 2,15,5,27.0,10,2.0,5,7.0
store 3,20,30,4.0,35,2.0,10,7.0


In [56]:
# We replace NaN values by using linear interpolation using row values
store_items.interpolate(method = 'linear', axis = 1)

Unnamed: 0,bikes,pants,glasses,watches,shirts,shoes,suits
store 1,20.0,30.0,50.0,35.0,15.0,8.0,45.0
store 2,15.0,5.0,7.5,10.0,2.0,5.0,7.0
store 3,20.0,30.0,4.0,35.0,22.5,10.0,10.0


# [38. Loading Data into a Pandas DataFrame](#Content:)

In [57]:
# We load Google stock data in a DataFrame
Google_stock = pd.read_csv('GOOG.csv')

# We print some information about Google_stock
print('Google_stock is of type:', type(Google_stock))
print('Google_stock has shape:', Google_stock.shape)

Google_stock is of type: <class 'pandas.core.frame.DataFrame'>
Google_stock has shape: (3313, 7)


In [58]:
Google_stock

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2004-08-19,49.676899,51.693783,47.669952,49.845802,49.845802,44994500
1,2004-08-20,50.178635,54.187561,49.925285,53.805050,53.805050,23005800
2,2004-08-23,55.017166,56.373344,54.172661,54.346527,54.346527,18393200
3,2004-08-24,55.260582,55.439419,51.450363,52.096165,52.096165,15361800
4,2004-08-25,52.140873,53.651051,51.604362,52.657513,52.657513,9257400
...,...,...,...,...,...,...,...
3308,2017-10-09,980.000000,985.424988,976.109985,977.000000,977.000000,891400
3309,2017-10-10,980.000000,981.570007,966.080017,972.599976,972.599976,968400
3310,2017-10-11,973.719971,990.710022,972.250000,989.250000,989.250000,1693300
3311,2017-10-12,987.450012,994.119995,985.000000,987.830017,987.830017,1262400


In [59]:
Google_stock.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2004-08-19,49.676899,51.693783,47.669952,49.845802,49.845802,44994500
1,2004-08-20,50.178635,54.187561,49.925285,53.80505,53.80505,23005800
2,2004-08-23,55.017166,56.373344,54.172661,54.346527,54.346527,18393200
3,2004-08-24,55.260582,55.439419,51.450363,52.096165,52.096165,15361800
4,2004-08-25,52.140873,53.651051,51.604362,52.657513,52.657513,9257400


In [60]:
Google_stock.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
3308,2017-10-09,980.0,985.424988,976.109985,977.0,977.0,891400
3309,2017-10-10,980.0,981.570007,966.080017,972.599976,972.599976,968400
3310,2017-10-11,973.719971,990.710022,972.25,989.25,989.25,1693300
3311,2017-10-12,987.450012,994.119995,985.0,987.830017,987.830017,1262400
3312,2017-10-13,992.0,997.210022,989.0,989.679993,989.679993,1157700


In [61]:
Google_stock.tail(10)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
3303,2017-10-02,959.97998,962.539978,947.840027,953.27002,953.27002,1283400
3304,2017-10-03,954.0,958.0,949.140015,957.789978,957.789978,888300
3305,2017-10-04,957.0,960.390015,950.690002,951.679993,951.679993,952400
3306,2017-10-05,955.48999,970.909973,955.179993,969.960022,969.960022,1213800
3307,2017-10-06,966.700012,979.460022,963.359985,978.890015,978.890015,1173900
3308,2017-10-09,980.0,985.424988,976.109985,977.0,977.0,891400
3309,2017-10-10,980.0,981.570007,966.080017,972.599976,972.599976,968400
3310,2017-10-11,973.719971,990.710022,972.25,989.25,989.25,1693300
3311,2017-10-12,987.450012,994.119995,985.0,987.830017,987.830017,1262400
3312,2017-10-13,992.0,997.210022,989.0,989.679993,989.679993,1157700


In [62]:
Google_stock.isnull().any()

Date         False
Open         False
High         False
Low          False
Close        False
Adj Close    False
Volume       False
dtype: bool

# [39. Get descriptive statistics on each column](#Content:)

In [63]:
# We get descriptive statistics on our stock data
Google_stock.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,3313.0,3313.0,3313.0,3313.0,3313.0,3313.0
mean,380.186092,383.49374,376.519309,380.072458,380.072458,8038476.0
std,223.81865,224.974534,222.473232,223.85378,223.85378,8399521.0
min,49.274517,50.541279,47.669952,49.681866,49.681866,7900.0
25%,226.556473,228.394516,224.003082,226.40744,226.40744,2584900.0
50%,293.312286,295.433502,289.929291,293.029114,293.029114,5281300.0
75%,536.650024,540.0,532.409973,536.690002,536.690002,10653700.0
max,992.0,997.210022,989.0,989.679993,989.679993,82768100.0


# [40. Apply the .describe() method on a single column](#Content:)

In [64]:
# We get descriptive statistics on a single column of our DataFrame
Google_stock['Adj Close'].describe()

count    3313.000000
mean      380.072458
std       223.853780
min        49.681866
25%       226.407440
50%       293.029114
75%       536.690002
max       989.679993
Name: Adj Close, dtype: float64

# [41. Get min, max and mean for each column](#Content:)

In [65]:
# We print information about our DataFrame
print('Maximum values of each column:\n', Google_stock.max())
print()
print('Minimum Close value:', Google_stock['Close'].min())
print()
print('Average value of each column:\n', Google_stock.mean())

Maximum values of each column:
 Date         2017-10-13
Open              992.0
High         997.210022
Low               989.0
Close        989.679993
Adj Close    989.679993
Volume         82768100
dtype: object

Minimum Close value: 49.681866

Average value of each column:
 Open         3.801861e+02
High         3.834937e+02
Low          3.765193e+02
Close        3.800725e+02
Adj Close    3.800725e+02
Volume       8.038476e+06
dtype: float64


# [42. How to measure the data correlation?](#Content:)

In [66]:
# We display the correlation between columns
Google_stock.corr()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
Open,1.0,0.999904,0.999845,0.999745,0.999745,-0.564258
High,0.999904,1.0,0.999834,0.999868,0.999868,-0.562749
Low,0.999845,0.999834,1.0,0.999899,0.999899,-0.567007
Close,0.999745,0.999868,0.999899,1.0,1.0,-0.564967
Adj Close,0.999745,0.999868,0.999899,1.0,1.0,-0.564967
Volume,-0.564258,-0.562749,-0.567007,-0.564967,-0.564967,1.0
