# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Series-Object" data-toc-modified-id="Series-Object-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Series Object</a></div><div class="lev1 toc-item"><a href="#DataFrame" data-toc-modified-id="DataFrame-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>DataFrame</a></div><div class="lev1 toc-item"><a href="#Types" data-toc-modified-id="Types-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Types</a></div>

We've seen how to load data, and do some basic operations such as subsetting
and slicing data.
Now let's take a look into what the dataframe and series objects are.

# Series Object

In [1]:
import pandas as pd

In [2]:
# manually create a series object
s = pd.Series(['banana', 42])

In [3]:
s

0    banana
1        42
dtype: object

In [4]:
# we can add a label to the series elements too
s = pd.Series(['Wes', 'Creator'],
             index=['Person', 'Who'])

In [5]:
s

Person        Wes
Who       Creator
dtype: object

In [6]:
# creating python dictionaries
d = {'fname': 'daniel', 'lname': 'chen'}

In [7]:
# getting a key from a dictionary
d['fname']

'daniel'

# DataFrame

In [8]:
# we use dictionaries to create dataframes
scientist = pd.DataFrame({
    'Name': ['rf', 'wg'],
    'Occuptation': ['chem', 'stat']
})

In [9]:
scientist

Unnamed: 0,Name,Occuptation
0,rf,chem
1,wg,stat


In [10]:
scientists = pd.read_csv('../data/scientists.csv')

In [11]:
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


# Types

In [12]:
# get the data type stored in the columns
scientists.dtypes

Name          object
Born          object
Died          object
Age            int64
Occupation    object
dtype: object

In [13]:
# convert a column into a date-time
born_time = pd.to_datetime(scientists['Born'],
                          format='%Y-%m-%d')

In [14]:
born_time

0   1920-07-25
1   1876-06-13
2   1820-05-12
3   1867-11-07
4   1907-05-27
5   1813-03-15
6   1912-06-23
7   1777-04-30
Name: Born, dtype: datetime64[ns]

In [15]:
died_time = pd.to_datetime(scientists.Died)

In [16]:
scientists.head()

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist


In [17]:
# assign a value into a new column
scientists['born_dt'] = born_time

In [18]:
# assign the died time to a new column
scientists['died_dt'] = died_time

In [19]:
# look at our results
scientists.head()

Unnamed: 0,Name,Born,Died,Age,Occupation,born_dt,died_dt
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist,1920-07-25,1958-04-16
1,William Gosset,1876-06-13,1937-10-16,61,Statistician,1876-06-13,1937-10-16
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse,1820-05-12,1910-08-13
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist,1867-11-07,1934-07-04
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist,1907-05-27,1964-04-14


In [20]:
# Use info to get the column name, number of missing values, and data type
scientists.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 7 columns):
Name          8 non-null object
Born          8 non-null object
Died          8 non-null object
Age           8 non-null int64
Occupation    8 non-null object
born_dt       8 non-null datetime64[ns]
died_dt       8 non-null datetime64[ns]
dtypes: datetime64[ns](2), int64(1), object(4)
memory usage: 528.0+ bytes


In [21]:
# Use the dtypes to just get the types
scientists.dtypes

Name                  object
Born                  object
Died                  object
Age                    int64
Occupation            object
born_dt       datetime64[ns]
died_dt       datetime64[ns]
dtype: object