# Data Analysis Fundamentals
## Analyzing and Visualizing Data
### Aggregating Data
Aggergating data is often the first analysis you want to perform in order to summarize the data and get a feel for it as a whole. Often it is common to count the number of rows of data you have and the sum of your numeric values. So lets get started up load in our data. 

In [11]:
%matplotlib inline
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv('data/Lemonade-2.csv', index_col = 'Date', parse_dates = True, na_values = 'nan' )
df_2_week = df.head(14)
df.head()

Unnamed: 0_level_0,Day,Temperature,Rainfall,Flyers,Price,Sales,Revenue
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-01-01,Sunday,80.6,2.0,15,0.3,10,3.0
2017-02-01,Monday,84.02,1.33,15,0.3,13,3.9
2017-03-01,Tuesday,94.1,1.33,27,0.3,15,4.5
2017-04-01,Wednesday,111.38,1.05,28,0.3,17,5.1
2017-05-01,Thursday,108.32,1.0,33,0.3,18,5.4


In [12]:
df_2_week.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 14 entries, 2017-01-01 to 2017-01-14
Data columns (total 7 columns):
Day            14 non-null object
Temperature    14 non-null float64
Rainfall       14 non-null float64
Flyers         14 non-null int64
Price          14 non-null float64
Sales          14 non-null int64
Revenue        14 non-null float64
dtypes: float64(4), int64(2), object(1)
memory usage: 896.0+ bytes


Using .info() we will be able to see the count and data types but for our data it better we use .describe()

In [14]:
df_2_week.describe()

Unnamed: 0,Temperature,Rainfall,Flyers,Price,Sales,Revenue
count,14.0,14.0,14.0,14.0,14.0,14.0
mean,97.121429,1.317857,23.0,0.3,14.642857,4.392857
std,11.26938,0.27291,6.101702,5.760664e-17,2.590133,0.77704
min,77.54,1.0,15.0,0.3,10.0,3.0
25%,90.815,1.0825,19.0,0.3,13.0,3.9
50%,99.5,1.33,23.0,0.3,15.0,4.5
75%,106.43,1.4875,27.75,0.3,17.0,5.1
max,111.38,2.0,33.0,0.3,18.0,5.4


The only thing that is missing from .describe() is the sum() but that can be simple as running .sum()

In [15]:
df_2_week.sum()

Day            SundayMondayTuesdayWednesdayThursdayFridaySatu...
Temperature                                               1359.7
Rainfall                                                   18.45
Flyers                                                       322
Price                                                        4.2
Sales                                                        205
Revenue                                                     61.5
dtype: object

Some of the sum values are potentially useful such as total number of sales, total flyer distributed, total revenue and total rain fall. The rest don't seem too have much potential such as total temperature and total price. The power of the mean is it give us a good feel of a typicial value of the field in the dataset as a whole. The min is the lowest value and max is the highest. We can see that rosie mininum sale in these two weeks was \$10 and her highest selling day was \$18.

## Grouping and Summarizing Data

In [22]:
df_2_week.groupby('Day').describe()

Unnamed: 0_level_0,Flyers,Flyers,Flyers,Flyers,Flyers,Flyers,Flyers,Flyers,Price,Price,...,Sales,Sales,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
Day,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Friday,2.0,21.0,2.828427,19.0,20.0,21.0,22.0,23.0,2.0,0.3,...,14.0,15.0,2.0,88.52,15.528065,77.54,83.03,88.52,94.01,99.5
Monday,2.0,17.5,3.535534,15.0,16.25,17.5,18.75,20.0,2.0,0.3,...,16.0,17.0,2.0,92.3,11.709688,84.02,88.16,92.3,96.44,100.58
Saturday,2.0,21.0,2.828427,19.0,20.0,21.0,22.0,23.0,2.0,0.3,...,16.0,17.0,2.0,101.3,14.255273,91.22,96.26,101.3,106.34,111.38
Sunday,2.0,21.5,9.192388,15.0,18.25,21.5,24.75,28.0,2.0,0.3,...,13.75,15.0,2.0,90.05,13.364318,80.6,85.325,90.05,94.775,99.5
Thursday,2.0,24.5,12.020815,16.0,20.25,24.5,28.75,33.0,2.0,0.3,...,17.0,18.0,2.0,104.54,5.345727,100.76,102.65,104.54,106.43,108.32
Tuesday,2.0,30.0,4.242641,27.0,28.5,30.0,31.5,33.0,2.0,0.3,...,17.25,18.0,2.0,102.11,11.327851,94.1,98.105,102.11,106.115,110.12
Wednesday,2.0,25.5,3.535534,23.0,24.25,25.5,26.75,28.0,2.0,0.3,...,15.75,17.0,2.0,101.03,14.63711,90.68,95.855,101.03,106.205,111.38
