In [2]:
import pandas as pd

two data types in pandas 'panda series' and 'dataframe' first we'll look at series

## Series data type

can hold multiple data types at once and labels for stored data

In [3]:
groceries = pd.Series(data=[30, 6, 'yes', 'no'], index=['eggs', 'apples', 'milk', 'bread'])
print(groceries)

eggs       30
apples      6
milk      yes
bread      no
dtype: object


shape is like in numpy and gives us the dimensions of the data

In [4]:
groceries.shape

(4,)

ndim gives us the number of dimensions of the data

In [5]:
groceries.ndim

1

size gives us total num of values in array

In [6]:
groceries.size

4

we can get just the data with .values

In [7]:
groceries.values

array([30, 6, 'yes', 'no'], dtype=object)

or just the indexes with .index

In [8]:
groceries.index

Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')

you can check if a index exists with the in command i.e.:


In [9]:
print('bread' in groceries)

'yogurt' in groceries

True


False

### accessing and deleting from the series element

first way:

In [10]:
groceries['eggs']

30

we can get more then one at once as well. we need to add a extra set of ```[]``` for it to work

In [11]:
groceries[['eggs', 'milk']]

eggs     30
milk    yes
dtype: object

we can also grab elements by numerical indexes

In [12]:
groceries[0]

30

In [13]:
groceries[-1]

'no'

In [14]:
groceries[[0,1]]

eggs      30
apples     6
dtype: object

loc can be used to explicitly state we are using a labeled index

In [15]:
groceries.loc[['eggs', 'apples']]

eggs      30
apples     6
dtype: object

we can explicitly grab by numerical index with iloc

In [16]:
groceries.iloc[[1,2]]

apples      6
milk      yes
dtype: object

#### series are mutable

In [17]:
groceries

eggs       30
apples      6
milk      yes
bread      no
dtype: object

In [18]:
groceries['eggs'] = 2

In [19]:
groceries

eggs        2
apples      6
milk      yes
bread      no
dtype: object

we can remove a element with .drop. drop returns a new series with the elements removed but doesn't actually change the orig series

In [20]:
groceries.drop('apples')

eggs       2
milk     yes
bread     no
dtype: object

In [21]:
groceries

eggs        2
apples      6
milk      yes
bread      no
dtype: object

to remove from orig series we add the inplace paramater. this causes the method to not return anything

In [22]:
groceries.drop('apples', inplace=True)

In [23]:
groceries

eggs       2
milk     yes
bread     no
dtype: object

### Artimetric operation on panda series

In [24]:
fruits = pd.Series([10,6,3], ['apples', 'oranges', 'banana'])

In [25]:
fruits

apples     10
oranges     6
banana      3
dtype: int64

In [26]:
fruits + 2

apples     12
oranges     8
banana      5
dtype: int64

In [27]:
fruits - 2

apples     8
oranges    4
banana     1
dtype: int64

In [28]:
fruits * 2

apples     20
oranges    12
banana      6
dtype: int64

In [29]:
fruits / 2

apples     5.0
oranges    3.0
banana     1.5
dtype: float64

#### numpy allows us to get more compex math out of our series

In [30]:
import numpy as np
fruits

apples     10
oranges     6
banana      3
dtype: int64

In [31]:
np.sqrt(fruits)

apples     3.162278
oranges    2.449490
banana     1.732051
dtype: float64

In [32]:
np.exp(fruits)

apples     22026.465795
oranges      403.428793
banana        20.085537
dtype: float64

In [33]:
np.power(fruits, 2)

apples     100
oranges     36
banana       9
dtype: int64

In [34]:
fruits['banana'] + 2

5

In [35]:
fruits.iloc[0] - 2

8

In [36]:
fruits[['apples', 'oranges']] * 2

apples     20
oranges    12
dtype: int64

In [37]:
fruits.loc[['apples', 'oranges']] - 2

apples     8
oranges    4
dtype: int64

we can apply operation to mixed data series as well

In [38]:
newg = groceries

In [39]:
newg

eggs       2
milk     yes
bread     no
dtype: object

as long as the operation works for all the data types. i.e. no / becuse cant divide string

In [40]:
newg * 2

eggs          4
milk     yesyes
bread      nono
dtype: object

## DataFrames

two dimensional data structure with labeled rows and colomuns. similar to excel spreadsheet

lets create a dataframe from a dictionary

In [41]:
items = {
    'bob': pd.Series([245,25,55], index=['bike', 'pants', 'watch']),
    'alice': pd.Series([40,110,500,45], index=[ 'book', 'glasses', 'bike', 'pants'])
}

type(items)

dict

don't forget to capitalize the D and F in DataFrame

In [42]:
shopping_carts = pd.DataFrame(items)
shopping_carts

Unnamed: 0,bob,alice
bike,245.0,500.0
book,,40.0
glasses,,110.0
pants,25.0,45.0
watch,55.0,


we can also create a dataframe from data without labels

In [43]:
data = {'bob': pd.Series([245,25,55]),
      'alice': pd.Series([40,110,500,45])}
df = pd.DataFrame(data)
df

Unnamed: 0,bob,alice
0,245.0,40
1,25.0,110
2,55.0,500
3,,45


as you can see above data is indexed starting at 0

we can get helpful information from our DF like indexs (.index), coloumns (.columns), and values (.values)

In [44]:
shopping_carts.index

Index(['bike', 'book', 'glasses', 'pants', 'watch'], dtype='object')

In [45]:
shopping_carts.columns

Index(['bob', 'alice'], dtype='object')

In [46]:
shopping_carts.values

array([[245., 500.],
       [ nan,  40.],
       [ nan, 110.],
       [ 25.,  45.],
       [ 55.,  nan]])

we can use the standard methods to get info about size and shape like: .shape, .size, and .ndim for number of dimensions

In [47]:
shopping_carts.shape

(5, 2)

In [48]:
shopping_carts.size

10

In [49]:
shopping_carts.ndim

2

shift allows us to return a seperate DF with the labels shifted to the left. it does not change the original data frame

In [50]:
shopping_carts.shift


<bound method DataFrame.shift of            bob  alice
bike     245.0  500.0
book       NaN   40.0
glasses    NaN  110.0
pants     25.0   45.0
watch     55.0    NaN>

In [51]:
shopping_carts


Unnamed: 0,bob,alice
bike,245.0,500.0
book,,40.0
glasses,,110.0
pants,25.0,45.0
watch,55.0,


we can use the indexes of other DataFrames to create new DataFrames

In [52]:
bob_shopping_cart = pd.DataFrame(items, columns=['bob'])
bob_shopping_cart

Unnamed: 0,bob
bike,245
pants,25
watch,55


another one that returns for each coloumn but only certain rows

In [53]:
sel_shopping_cart = pd.DataFrame(items, index=['pants', 'book'])
sel_shopping_cart

Unnamed: 0,bob,alice
pants,25.0,45
book,,40


we can get even more specific and select specific coloumns and only certain rows

In [54]:
alice_sel_shopping_cart = pd.DataFrame(items, index=['glasses', 'bike'], columns=['alice'])
alice_sel_shopping_cart

Unnamed: 0,alice
glasses,110
bike,500


we can also create dataframes from lists


In [55]:
data = {'integers': [1,2,3],
       'floats': [1.1,2.2,3.3]}
df = pd.DataFrame(data)
df

Unnamed: 0,integers,floats
0,1,1.1
1,2,2.2
2,3,3.3


as you can see this creates a DF with no row labels. We can always add row labels with the 'index' argument when creating DFs

In [56]:
df = pd.DataFrame(data, index=['idx 1', 'idx 2', 'idx 3'])
df

Unnamed: 0,integers,floats
idx 1,1,1.1
idx 2,2,2.2
idx 3,3,3.3


another method is to create a data frame using a list of dictionaries

In [57]:
items =[{'bikes': 20, 'pants': 30, 'watches': 35}, {'watches': 10, 'glasses': 50, 'pants': 35, 'bikes': 12}]
store_items = pd.DataFrame(items, index=['store 1', 'store 2'])
store_items

Unnamed: 0,bikes,glasses,pants,watches
store 1,20,,30,35
store 2,12,50.0,35,10


### Accessing Elements in DataFrames

we can access rows, columns, or individal elements

first we'll go by column

In [58]:
store_items[['bikes']]

Unnamed: 0,bikes
store 1,20
store 2,12


we can get more then one column at once

In [59]:
store_items[['bikes', 'pants']]

Unnamed: 0,bikes,pants
store 1,20,30
store 2,12,35


here's how to access a row

In [60]:
store_items.loc[['store 1']]

Unnamed: 0,bikes,glasses,pants,watches
store 1,20,,30,35


we can get the value at a specfic row and column as follows

In [61]:
store_items['bikes']['store 1']

20

As you can see above the column label always comes first

#### adding items to a data frame

In [62]:
store_items['shirts'] = [5, 12]

In [63]:
store_items

Unnamed: 0,bikes,glasses,pants,watches,shirts
store 1,20,,30,35,5
store 2,12,50.0,35,10,12


we can use arthimetic and existing columns to create new columns

In [64]:
store_items['suits'] = store_items['shirts'] + store_items['pants']
store_items

Unnamed: 0,bikes,glasses,pants,watches,shirts,suits
store 1,20,,30,35,5,35
store 2,12,50.0,35,10,12,47


now suppose we want to add a new row

we start by creating a new DataFrame then add it on to the previously existing DataFrame

In [65]:
new_items = [{'bikes': 20, 'glasses': 5, 'watches': 44, 'pants': 9}]
new_store = pd.DataFrame(new_items, index=['store 3'])
new_store

Unnamed: 0,bikes,glasses,pants,watches
store 3,20,5,9,44


In [66]:
store_items = store_items.append(new_store, sort=True)
store_items

Unnamed: 0,bikes,glasses,pants,shirts,suits,watches
store 1,20,,30,5.0,35.0,35
store 2,12,50.0,35,12.0,47.0,10
store 3,20,5.0,9,,,44


we can add new columns to only certain rows. here we add new_watches to only stores 2 and 3

In [67]:
store_items['new_watches'] = store_items['watches'][1:]

In [68]:
store_items

Unnamed: 0,bikes,glasses,pants,shirts,suits,watches,new_watches
store 1,20,,30,5.0,35.0,35,
store 2,12,50.0,35,12.0,47.0,10,10.0
store 3,20,5.0,9,,,44,44.0


we can even place a new column in a specfic order. first item is location, second is label, third is data

In [69]:
store_items.insert(5, 'shoes', [6, 4, 5])

In [70]:
store_items

Unnamed: 0,bikes,glasses,pants,shirts,suits,shoes,watches,new_watches
store 1,20,,30,5.0,35.0,6,35,
store 2,12,50.0,35,12.0,47.0,4,10,10.0
store 3,20,5.0,9,,,5,44,44.0


we can delete columns with .pop and delete rows/columns with .drop

first we'll delete a column with .pop

In [71]:
store_items.pop('new_watches')
store_items

Unnamed: 0,bikes,glasses,pants,shirts,suits,shoes,watches
store 1,20,,30,5.0,35.0,6,35
store 2,12,50.0,35,12.0,47.0,4,10
store 3,20,5.0,9,,,5,44


now we'll remove a two columns with .drop using the axis argument to show we want to delete columns and not rows

In [72]:
store_items = store_items.drop(['shoes', 'watches'], axis=1)
store_items

Unnamed: 0,bikes,glasses,pants,shirts,suits
store 1,20,,30,5.0,35.0
store 2,12,50.0,35,12.0,47.0
store 3,20,5.0,9,,


finnally we'll delete a couple rows

In [73]:
store_items = store_items.drop(['store 1', 'store 3'
], axis=0)
store_items

Unnamed: 0,bikes,glasses,pants,shirts,suits
store 2,12,50.0,35,12.0,47.0


#### Changing DataFrame column labels

we use the .rename method which takes a columns argument containing old names as keys and new names as coresponding values

In [74]:
store_items = store_items.rename(columns={'bikes': 'hats'})
store_items

Unnamed: 0,hats,glasses,pants,shirts,suits
store 2,12,50.0,35,12.0,47.0


we can also use .rename to change row labels, make sure the argument is named index instead of columns

In [75]:
store_items = store_items.rename(index={'store 2': 'last store'})
store_items

Unnamed: 0,hats,glasses,pants,shirts,suits
last store,12,50.0,35,12.0,47.0


if you want you can even use a column values as a row index labels with .set_index

In [76]:
store_items = store_items.set_index('pants')
store_items

Unnamed: 0_level_0,hats,glasses,shirts,suits
pants,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
35,12,50.0,12.0,47.0


## Cleaning The Data (Dealing with NaN)

in machine learning we need to make sure our data isn't missing any values. real world data is usually far from perfect when we get it so we need to clean it up, removing values like NaN before we can feed it to our machine learning algorithms 

lets start with a new dataframe 

In [77]:
items = [{'bikes': 20, 'pants': 30, 'watches': 35, 'shirts': 35, 'shoes': 8, 'suits': 45},
        {'watches': 10, 'glasses': 50, 'bikes': 35, 'pants': 5, 'shirts': 2, 'shoes': 5, 'suits': 7},
        {'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4, 'shoes': 10}]
store_items = pd.DataFrame(items, index=['store 1', 'store 2', 'store 3'])
store_items

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20,,30,35.0,8,45.0,35
store 2,35,50.0,5,2.0,5,7.0,10
store 3,20,4.0,30,,10,,35


we can combine .isnull and .sum to get the total number of NaN values in our DataFrame

In [78]:
NaNs = store_items.isnull() #.sum().sum()
print(NaNs)

         bikes  glasses  pants  shirts  shoes  suits  watches
store 1  False     True  False   False  False  False    False
store 2  False    False  False   False  False  False    False
store 3  False    False  False    True  False   True    False


just .isnull shows us our data with true for NaN values otherwise false. in Pandas True=1 False=0 so we can add our true values with .sum(). however just a single sum shows us the total NaNs (true) for each columns as you see below.

In [79]:
NaNs = store_items.isnull().sum()
print(NaNs)

bikes      0
glasses    1
pants      0
shirts     1
shoes      0
suits      1
watches    0
dtype: int64


if we want the total for the whole DataFrame we just add a second .sum()

In [80]:
NaNs = store_items.isnull().sum().sum()
print(NaNs)

3


we could do all this the other way around by counting non NaN values instead. to do this we use .count

In [81]:
store_items.count()

bikes      3
glasses    2
pants      3
shirts     2
shoes      3
suits      2
watches    3
dtype: int64

so to get NaN values from our .count we would do as follows

In [82]:
NaNs = store_items.size - store_items.count().sum()
print(NaNs)

3


Once we have found our NaNs we have two choices:

### Removing and Replacing NaN values

first we'll remove

using .dropna we can remove whole rows or columns with NaN values. You just have to specify the axis. 
By default .dropna removes the relevant frames and returns a new DataFrame but does not modfiy the original DataFrame. If we want to remove them in the orifianl DataFrame we add the inplace argument set to true i.e. ```inplace=True```

In [83]:
#remove NaN rows
store_items.dropna(axis=0)

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 2,35,50.0,5,2.0,5,7.0,10


In [84]:
#remove NaN columns
store_items.dropna(axis=1)

Unnamed: 0,bikes,pants,shoes,watches
store 1,20,30,8,35
store 2,35,5,5,10
store 3,20,30,10,35


Now we'll replace NaN values instead of eliminating them"

we can do this using the fillna method which requires one argument: the thing we want to put in place of the NaN values

In [85]:
store_items.fillna(0)

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20,0.0,30,35.0,8,45.0,35
store 2,35,50.0,5,2.0,5,7.0,10
store 3,20,4.0,30,0.0,10,0.0,35


we can also use a technique called forward filling to replace NaN values with the previous value along the axis we specify. This only works if there is a previous value on that axis. Otherwise the value stays NaN

In [86]:
store_items.fillna(method='ffill', axis=0)

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20,,30,35.0,8,45.0,35
store 2,35,50.0,5,2.0,5,7.0,10
store 3,20,4.0,30,2.0,10,7.0,35


In [87]:
store_items.fillna(method='ffill', axis=1)

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20.0,20.0,30.0,35.0,8.0,45.0,35.0
store 2,35.0,50.0,5.0,2.0,5.0,7.0,10.0
store 3,20.0,4.0,30.0,30.0,10.0,10.0,35.0


a similar technique is called backward filling and pulls the next value from the given axis instead of the previous one.

In [88]:
store_items.fillna(method='bfill', axis=1)

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20.0,30.0,30.0,35.0,8.0,45.0,35.0
store 2,35.0,50.0,5.0,2.0,5.0,7.0,10.0
store 3,20.0,4.0,30.0,10.0,10.0,35.0,35.0


In [89]:
store_items.fillna(method='bfill', axis=0)

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20,50.0,30,35.0,8,45.0,35
store 2,35,50.0,5,2.0,5,7.0,10
store 3,20,4.0,30,,10,,35


another way to relce NaNs is with various interpolation methods. these are done using the .interpolate method. simmilar to .fillna this takes a method we want to apply and a axis

one popular interpolation method is 'linear'. but like with forward filling 'linear' requires previous values to exist along the given axis that it can draw upon.

In [90]:
store_items.interpolate(method='linear', axis=0)

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20,,30,35.0,8,45.0,35
store 2,35,50.0,5,2.0,5,7.0,10
store 3,20,4.0,30,2.0,10,7.0,35


In [91]:
store_items.interpolate(method='linear', axis=1)

Unnamed: 0,bikes,glasses,pants,shirts,shoes,suits,watches
store 1,20.0,25.0,30.0,35.0,8.0,45.0,35.0
store 2,35.0,50.0,5.0,2.0,5.0,7.0,10.0
store 3,20.0,4.0,30.0,20.0,10.0,22.5,35.0


## Loading Other Data into a DataFrame

we can load in csv data stored in the same directory with .read_csv 

In [94]:
Google_stock = pd.read_csv('./GOOG.csv')
Google_stock

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2004-08-19,49.676899,51.693783,47.669952,49.845802,49.845802,44994500
1,2004-08-20,50.178635,54.187561,49.925285,53.805050,53.805050,23005800
2,2004-08-23,55.017166,56.373344,54.172661,54.346527,54.346527,18393200
3,2004-08-24,55.260582,55.439419,51.450363,52.096165,52.096165,15361800
4,2004-08-25,52.140873,53.651051,51.604362,52.657513,52.657513,9257400
5,2004-08-26,52.135906,53.626213,51.991844,53.606342,53.606342,7148200
6,2004-08-27,53.700729,53.959049,52.503513,52.732029,52.732029,6258300
7,2004-08-30,52.299839,52.404160,50.675404,50.675404,50.675404,5235700
8,2004-08-31,50.819469,51.519913,50.749920,50.854240,50.854240,4954800
9,2004-09-01,51.018177,51.152302,49.512966,49.801090,49.801090,9206800


we can get the first five rows of DataFrame with .head

In [95]:
Google_stock.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2004-08-19,49.676899,51.693783,47.669952,49.845802,49.845802,44994500
1,2004-08-20,50.178635,54.187561,49.925285,53.80505,53.80505,23005800
2,2004-08-23,55.017166,56.373344,54.172661,54.346527,54.346527,18393200
3,2004-08-24,55.260582,55.439419,51.450363,52.096165,52.096165,15361800
4,2004-08-25,52.140873,53.651051,51.604362,52.657513,52.657513,9257400



we can get the last 5 rows in a similar way with .tail

In [96]:
Google_stock.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
3308,2017-10-09,980.0,985.424988,976.109985,977.0,977.0,891400
3309,2017-10-10,980.0,981.570007,966.080017,972.599976,972.599976,968400
3310,2017-10-11,973.719971,990.710022,972.25,989.25,989.25,1693300
3311,2017-10-12,987.450012,994.119995,985.0,987.830017,987.830017,1262400
3312,2017-10-13,992.0,997.210022,989.0,989.679993,989.679993,1157700


we can specify how many rows we want when using the head or tail method

In [98]:
Google_stock.head(3)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2004-08-19,49.676899,51.693783,47.669952,49.845802,49.845802,44994500
1,2004-08-20,50.178635,54.187561,49.925285,53.80505,53.80505,23005800
2,2004-08-23,55.017166,56.373344,54.172661,54.346527,54.346527,18393200


another way to check for any NaN values is with .isnull followed by .any to see if any columns returned true for isnull

In [99]:
Google_stock.isnull().any()

Date         False
Open         False
High         False
Low          False
Close        False
Adj Close    False
Volume       False
dtype: bool

a great way to get descriptivve statiscis quickly is with the .describe method

In [100]:
Google_stock.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,3313.0,3313.0,3313.0,3313.0,3313.0,3313.0
mean,380.186092,383.49374,376.519309,380.072458,380.072458,8038476.0
std,223.81865,224.974534,222.473232,223.85378,223.85378,8399521.0
min,49.274517,50.541279,47.669952,49.681866,49.681866,7900.0
25%,226.556473,228.394516,224.003082,226.40744,226.40744,2584900.0
50%,293.312286,295.433502,289.929291,293.029114,293.029114,5281300.0
75%,536.650024,540.0,532.409973,536.690002,536.690002,10653700.0
max,992.0,997.210022,989.0,989.679993,989.679993,82768100.0


we can get the describe stats for just one column as follows

In [101]:
Google_stock['High'].describe()

count    3313.000000
mean      383.493740
std       224.974534
min        50.541279
25%       228.394516
50%       295.433502
75%       540.000000
max       997.210022
Name: High, dtype: float64

we can also get just one stat at a time. Pandas includes a lot of these stat methods like .max, .min, and .mean

In [102]:
Google_stock.max()

Date         2017-10-13
Open                992
High             997.21
Low                 989
Close            989.68
Adj Close        989.68
Volume         82768100
dtype: object

In [103]:
Google_stock.min()

Date         2004-08-19
Open            49.2745
High            50.5413
Low               47.67
Close           49.6819
Adj Close       49.6819
Volume             7900
dtype: object

In [104]:
Google_stock.mean()

Open         3.801861e+02
High         3.834937e+02
Low          3.765193e+02
Close        3.800725e+02
Adj Close    3.800725e+02
Volume       8.038476e+06
dtype: float64

these of course work for a single column as well

In [105]:
Google_stock['Open'].mean()

380.1860917132517

another thing you'll want to find is data correlation. we can get this using the corr method

In [106]:
Google_stock.corr()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
Open,1.0,0.999904,0.999845,0.999745,0.999745,-0.564258
High,0.999904,1.0,0.999834,0.999868,0.999868,-0.562749
Low,0.999845,0.999834,1.0,0.999899,0.999899,-0.567007
Close,0.999745,0.999868,0.999899,1.0,1.0,-0.564967
Adj Close,0.999745,0.999868,0.999899,1.0,1.0,-0.564967
Volume,-0.564258,-0.562749,-0.567007,-0.564967,-0.564967,1.0


we can use the .groupby method to calculate things based on a couple of columns. The example below would make more sense if you imagine the Dates were by year insead of day

In [107]:
Google_stock.groupby(['Date'])['High'].sum()

Date
2004-08-19     51.693783
2004-08-20     54.187561
2004-08-23     56.373344
2004-08-24     55.439419
2004-08-25     53.651051
2004-08-26     53.626213
2004-08-27     53.959049
2004-08-30     52.404160
2004-08-31     51.519913
2004-09-01     51.152302
2004-09-02     50.854240
2004-09-03     50.541279
2004-09-07     50.670437
2004-09-08     51.182110
2004-09-09     51.023144
2004-09-10     52.935703
2004-09-13     53.854729
2004-09-14     55.638126
2004-09-15     56.745922
2004-09-16     57.525848
2004-09-17     58.365391
2004-09-20     60.407108
2004-09-21     59.820923
2004-09-22     59.448345
2004-09-23     60.918781
2004-09-24     61.649033
2004-09-27     60.049435
2004-09-28     63.288372
2004-09-29     67.073753
2004-09-30     65.722542
                 ...    
2017-09-01    942.479980
2017-09-05    937.000000
2017-09-06    930.914978
2017-09-07    936.409973
2017-09-08    936.989990
2017-09-11    938.380005
2017-09-12    933.479980
2017-09-13    937.250000
2017-09-14    932.77

here we get averages

In [108]:
Google_stock.groupby(['Date'])['High'].mean()

Date
2004-08-19     51.693783
2004-08-20     54.187561
2004-08-23     56.373344
2004-08-24     55.439419
2004-08-25     53.651051
2004-08-26     53.626213
2004-08-27     53.959049
2004-08-30     52.404160
2004-08-31     51.519913
2004-09-01     51.152302
2004-09-02     50.854240
2004-09-03     50.541279
2004-09-07     50.670437
2004-09-08     51.182110
2004-09-09     51.023144
2004-09-10     52.935703
2004-09-13     53.854729
2004-09-14     55.638126
2004-09-15     56.745922
2004-09-16     57.525848
2004-09-17     58.365391
2004-09-20     60.407108
2004-09-21     59.820923
2004-09-22     59.448345
2004-09-23     60.918781
2004-09-24     61.649033
2004-09-27     60.049435
2004-09-28     63.288372
2004-09-29     67.073753
2004-09-30     65.722542
                 ...    
2017-09-01    942.479980
2017-09-05    937.000000
2017-09-06    930.914978
2017-09-07    936.409973
2017-09-08    936.989990
2017-09-11    938.380005
2017-09-12    933.479980
2017-09-13    937.250000
2017-09-14    932.77

## Creating DataFrames with .date_range

we can also create a DataFrame with .date_range as follows

In [109]:
dates = pd.date_range('2000-01-01', '2010-02-22')
all_date_dataframe = pd.DataFrame(index=dates)

In [110]:
all_date_dataframe.head()

2000-01-01
2000-01-02
2000-01-03
2000-01-04
2000-01-05
