# Chapter 1: Data Structures 

### 1.2 Problem. How can I create a DataFrame object in Pandas?

#### 1.21 What is a DataFrame?

The DataFrame data structure in Pandas is a two-dimensional labeled array.

- Data in the array can be of any type (integers, strings, floating point numbers, Python objects, etc.).

- Data within each column is homogeneous

- By default Pandas creates a numerical index for the rows in sequence 0...n

In [2]:
from IPython.display import Image

![title](https://github.com/alfredessa/pdacookbook/raw/3c302c20be5695f1b7787e64fb7708fe96757339/images/df1.jpg)

Here's an example where we have set the Dates column to be the index and label for the rows.

![title](https://github.com/alfredessa/pdacookbook/raw/3c302c20be5695f1b7787e64fb7708fe96757339/images/df2.jpg)

1.22 Preliminaries - import pandas and datetime library; create data for populating our first dataframe object.

In [3]:
import pandas as pd
import datetime

In [4]:
# create a list containing dates from 12-01 to 12-07
dt = datetime.datetime(2013,12,1)
end = datetime.datetime(2013,12,8)
step = datetime.timedelta(days=1)
dates = []

In [5]:
# populate the list 
while dt < end:
    dates.append(dt.strftime('%m-%d'))
    dt += step

In [6]:
dates

['12-01', '12-02', '12-03', '12-04', '12-05', '12-06', '12-07']

In [7]:
d = {'Dates': dates, 'Tokyo': [15, 19, 15, 11, 9, 8, 13], 'Paris': [-2,0,2,5,7,-5,-3], 'Mumbai':[20, 18, 2, 3, 19, 25, 27]}

In [8]:
d

{'Dates': ['12-01', '12-02', '12-03', '12-04', '12-05', '12-06', '12-07'],
 'Mumbai': [20, 18, 2, 3, 19, 25, 27],
 'Paris': [-2, 0, 2, 5, 7, -5, -3],
 'Tokyo': [15, 19, 15, 11, 9, 8, 13]}

### 1.23 Example 1: Create a Dataframe Object from a Python Dictionary of equal length lists

In [9]:
temps = pd.DataFrame(d)

In [10]:
ntemp = temps['Mumbai']

In [11]:
ntemp

0    20
1    18
2     2
3     3
4    19
5    25
6    27
Name: Mumbai, dtype: int64

In [12]:
temps = temps.set_index('Dates')

In [13]:
temps

Unnamed: 0_level_0,Mumbai,Paris,Tokyo
Dates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12-01,20,-2,15
12-02,18,0,19
12-03,2,2,15
12-04,3,5,11
12-05,19,7,9
12-06,25,-5,8
12-07,27,-3,13


### 1.24 Example 2: Create a Dataframe Object by reading a .csv file

In [14]:
titanic = pd.read_csv('data/titanic.csv')

In [15]:
titanic

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.0500,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.0750,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C


In [16]:
titanic.Survived.value_counts()

0    549
1    342
Name: Survived, dtype: int64

In [31]:
titanic.select

630    80.0
851    74.0
96     71.0
493    71.0
116    70.5
Name: Age, dtype: float64

In [34]:
print(titanic.loc[titanic['Sex'] == 'female'])

     PassengerId  Survived  Pclass  \
1              2         1       1   
2              3         1       3   
3              4         1       1   
8              9         1       3   
9             10         1       2   
10            11         1       3   
11            12         1       1   
14            15         0       3   
15            16         1       2   
18            19         0       3   
19            20         1       3   
22            23         1       3   
24            25         0       3   
25            26         1       3   
28            29         1       3   
31            32         1       1   
32            33         1       3   
38            39         0       3   
39            40         1       3   
40            41         0       3   
41            42         0       2   
43            44         1       2   
44            45         1       3   
47            48         1       3   
49            50         0       3   
52          

### 1.25 Example 3: Create Dataframe Object by reading a .csv file (Olympic Medalists)

In [35]:
medals=pd.read_csv('data/olympicmedals.csv')

In [37]:
medals.tail(10)

Unnamed: 0,City,Edition,Sport,Discipline,Athlete,NOC,Gender,Event,Event_gender,Medal
20094,Barcelona,1992,Baseball,Baseball,"CHEN, Chi-Hsin",TPE,Men,baseball,M,Silver
20095,Barcelona,1992,Baseball,Baseball,"CHEN, Wei-Chen",TPE,Men,baseball,M,Silver
20096,Barcelona,1992,Baseball,Baseball,"CHIANG, Tai-Chuan",TPE,Men,baseball,M,Silver
20097,Barcelona,1992,Baseball,Baseball,"HUANG, Chung-Yi",TPE,Men,baseball,M,Silver
20098,Barcelona,1992,Baseball,Baseball,"HUANG, Wen-Po",TPE,Men,baseball,M,Silver
20099,Barcelona,1992,Baseball,Baseball,"JONG, Yeu-Jeng",TPE,Men,baseball,M,Silver
20100,Barcelona,1992,Baseball,Baseball,"KU, Kuo-Chian",TPE,Men,baseball,M,Silver
20101,Barcelona,1992,Baseball,Baseball,"KUO LEE, Chien-Fu",TPE,Men,baseball,M,Silver
20102,Barcelona,1992,Baseball,Baseball,"LIAO, Ming-Hsiung",TPE,Men,baseball,M,Silver
20103,Barcelona,1992,Baseball,Baseball,"LIN, Chao-Huang",TPE,Men,baseball,M,Silve


In [38]:
medals.Sport.value_counts()

Athletics            2724
Aquatics             2558
Rowing               1791
Gymnastics           1736
Fencing              1226
Football              932
Shooting              874
Wrestling             847
Hockey                844
Sailing               800
Cycling               701
Equestrian            667
Boxing                610
Canoe / Kayak         582
Basketball            582
Volleyball            503
Handball              443
Weightlifting         354
Archery               209
Rugby                 192
Tennis                178
Judo                  155
Modern Pentathlon     141
Tug of War             94
Polo                   66
Lacrosse               59
Baseball               53
Golf                   30
Ice Hockey             27
Skating                27
Badminton              24
Cricket                24
Table Tennis           18
Rackets                10
Croquet                 8
Water Motorsports       5
Basque Pelota           4
Roque                   3
Jeu de paume