In [1]:
import numpy as np 
import pandas as pd

The two most import datastructures you will need to understand are;
    1. Series
    2. DataFrame

### 1. Series

- This is a one-dimension array-like object
- Contains an array of data (Any of NumPy's arrays) and data labels
- The datalabels are often called index/indices

######  a). Creating a Series object from an array

In [2]:
data = pd.Series([2, 4, 5, 6])
data

0    2
1    4
2    5
3    6
dtype: int64

In [3]:
my_array = [23, 66, 78, 43, 55]
ser = pd.Series(my_array)
ser

0    23
1    66
2    78
3    43
4    55
dtype: int64

###### b). Creating a Series object from a dictionary
   - When you are creating Series object from a dictionary, the dictionary keys will be used as index

In [14]:
#Creating the dictionary
dic = {'Texas': 13000, 'Ohio': 2343544, 'Seattle': 453243, 'NYC': 234455}
dic

{'Texas': 13000, 'Ohio': 2343544, 'Seattle': 453243, 'NYC': 234455}

In [19]:
#Creating a series object from the dictionary
ser = pd.Series(dic)

In [16]:
ser

Texas        13000
Ohio       2343544
Seattle     453243
NYC         234455
dtype: int64

Get the values and the index of your Series object like this;

In [4]:
#Getting the values of the Series object
data.values

array([2, 4, 5, 6], dtype=int64)

In [5]:
#Getting the index of the series object
data.index

RangeIndex(start=0, stop=4, step=1)

#### - Index
- By default, your series oject will have an index. 
- Do this if you want to identify each datapoint with your own indexing

In [6]:
values = pd.Series([23, 4,3, 5, 6, 4], index = ['a', 'b', 'c', 'd', 'e', 'f'])
values

a    23
b     4
c     3
d     5
e     6
f     4
dtype: int64

In [7]:
#Checking the index
values.index

Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')

#### - Slicing

- Slicing Pandas Series objects is quite easy.
- Think of it as the usual NumPy Slicing
- See the example below;

In [8]:
values

a    23
b     4
c     3
d     5
e     6
f     4
dtype: int64

In [10]:
#Get element at index 'a'
values['a']

23

In [11]:
#Get element at index 'c'
values['c']

3

In [12]:
#Get all elements between 'a' and 'c' with 'c' included
values['a' : 'c']

a    23
b     4
c     3
dtype: int64

In [13]:
#Get all elements from 'c' and beyond
values['c' :]

c    3
d    5
e    6
f    4
dtype: int64

## Pandas DataFrame

- You may of think of a Pandas DataFrame is a table-like or a Spreadsheet-like data structure
- A DataFrame is made up of a collection of columns which may carry different types (Such as Bools, Strings etc)
- It easier to view a Pandas DataFrame as a dictionary of Pandas Series object (Actually, one way of creating a DataFrame is through Series objects)
- A DataFrame is made up of rows and columns. 
- Both rows and columns are usually indexed.
- If a Series object is a one dimension object, then DataFrame is a two dimension object

##### - Creating DataFrames From dictionaries

In [27]:
#Creating the dictionary
pop_dict = {'countries' :['Kenya', 'Uganda', 'Tanzania', 'Eritrea', 'Somalia'], 'population': [47000000, 60000000, 33000000, 120000000, 80000000]}
pop_dict

{'countries': ['Kenya', 'Uganda', 'Tanzania', 'Eritrea', 'Somalia'],
 'population': [47000000, 60000000, 33000000, 120000000, 80000000]}

In [29]:
#Creating the dataframe object from 'pop_dict'
df = pd.DataFrame(pop_dict)
df

Unnamed: 0,countries,population
0,Kenya,47000000
1,Uganda,60000000
2,Tanzania,33000000
3,Eritrea,120000000
4,Somalia,80000000


##### - Creating dataframes from Pandas Series Object

In [31]:
#Creating the Series object
ser = pd.Series(['Rogers', 'Macharia', 'Muthoni', 'Anthony', 'Wandiri'])
ser

0      Rogers
1    Macharia
2     Muthoni
3     Anthony
4     Wandiri
dtype: object

In [34]:
#Creating a DataFrame from the series object 'ser'
ser_df = pd.DataFrame(ser, columns = ['names'])
ser_df

Unnamed: 0,names
0,Rogers
1,Macharia
2,Muthoni
3,Anthony
4,Wandiri
