# Pandas 
---
Pandas is a light-weight data frame for python.  Pandas is mainly used for data analysis, and is often partnered with both numpy and matplotlib's pyplot to further the strength of Pandas.  Pandas is typically imported as pd. 

In [2]:
import pandas as pd

### Elements of the DataFrame and Series


**_Index_**: An index is similar to the rows in excel, or a primary key in a database.  It identifies the row and sorts the data according to the index.  A major difference in Pandas is an index value is not unique.  Index values can be repeated with no error (however, this may not be suitable for types of databases).  An index can be created by a list of values 

**_Series_**: A series is similar to a dictionary, containing a single column of data and a single index identifying the data.  
A series is created by calling the following function:
```
pd.Series(data, index=index)
```
Index defaults to the pattern of [0, 1, 2, ..., n] or index can be passed as a list of values.  Data can be a list of values, a dictionary (which creates both the index and the columns) or a scalar value. 

**_NaN_**: A standard missing number marker in Pandas


### Series 
##### Creation
Creating Series are seen aboce in the definition of a series. Below are several examples of creating series:

In [3]:
## 3 ways to create Series
series_1 = pd.Series([2,4,6,8])
series_1

0    2
1    4
2    6
3    8
dtype: int64

series_1 does not contain an index, and uses the default index of [0, 1, 2, ...].  dtype refers to the form of data, which when not entered in the function, becomes int64 if all integers

In [4]:
series_2 = pd.Series([2,4,6.5,8], index =['a','b', 'c', 'a'])
series_2

a    2.0
b    4.0
c    6.5
a    8.0
dtype: float64

Series_2 uses an index and displays that a Series' index does not have to be numerical nor unique.  Float64 is the data type due to the 6.5 being a float 

In [5]:
series_3 = pd.Series({'i': 'do', 'not': 'like', 'green': 'eggs', 'and': 'ham', 'I': '2'})
series_3

I           2
and       ham
green    eggs
i          do
not      like
dtype: object

Series_3 reads from a dictionary, using no numerical values. It's datatype (dtype) is object because several of the data types are strings, not numerical values

### Slicing Series

In [6]:
## A series can be sliced like a dictionary
series_2['a']

a    2
a    8
dtype: float64

In [7]:
## A series can be sliced using integer values
series_3[2]

'eggs'

In [8]:
## A series can be sliced in sets using regular slicing methods
series_1[:3]

0    2
1    4
2    6
dtype: int64

In [9]:
## A series can be sliced using a list of integers
print(series_3[[3,1,4]])
## or a list of index values 
print('\n',series_3[['i','and','not']])

i        do
and     ham
not    like
dtype: object

 i        do
and     ham
not    like
dtype: object


In [10]:
## A series can lastly be sliced using many different operations, such as median
series_2[series_2 > series_2.median()]

c    6.5
a    8.0
dtype: float64

##### Operations and Series 
Series can be added, multiplied, divided, subtracted, and many more applications to a single series. For more information, see the [documentation of Series.](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html)  Several of the best operations and examples are seen below.

In [11]:
s_1_multiplied = series_1 * 4
s_1_multiplied

0     8
1    16
2    24
3    32
dtype: int64

In [12]:
s_1_special = series_1[0:2] + series_1[1:]
s_1_special

0   NaN
1     8
2   NaN
3   NaN
dtype: float64

### Dataframe 
A dataframe is an extension of a series, with columns and indexes.  A dataframe may be created in the following ways:
* **_Dictionary of Series_**: Key is the column name, value is the original series.  If a list of columns are passed within the function, it will overwrite the dictionary's column names
* **_List of Lists_**: A dataframe can be created with a list of lists. Column names are also necessary to be passed for a list of lists in order to have a proper dataframe
* **_List of Dicts_**: A list of dictionaries can create a dataframe.  Indexes are created by the keys and columns must be entered

##### Column Selection, Addition, Deletion
* **_Addition_**: A column can be added in a similar way to a dictionary: 
    
        df['new_column_name] = data or combination of current columns

* **_Deletion_**: A column can be deleted by calling:

        del df['column_name']
* **_Popped_**: A column can be removed from the current dataframe and stored in its own column by calling:

        column = df.pop('column_name')
* **_Assigning_** Assign creates a new column based on a current series, a lambda function, or an operation involving columns of the dataframe

        updated_dataframe = df.assign(column_name = some function)
* **_Slicing_**: A row can be selected as seen below and returns a series where the indexes are column names.  Several rows can be selected at a same time as seen below by calling integers to select certain entries

        sliced_row = df.loc['row entry']
        sliced_rows = df[3:5] (or other numbers)

In [13]:
## Creating a dataframe for testing and examples
df = pd.DataFrame([[4,5,3,1], [5,1,9,-2], [-6,2,1,0]], index = ['a','b','c'], columns = ['red', 'white', 'blue', 'green'])
df

Unnamed: 0,red,white,blue,green
a,4,5,3,1
b,5,1,9,-2
c,-6,2,1,0


In [14]:
## Applying assign using a lambda function to create a new column
df_rw_ratios = df.assign(rw_ratio = lambda x: x['red'] / x['white'])
df_rw_ratios

Unnamed: 0,red,white,blue,green,rw_ratio
a,4,5,3,1,0.8
b,5,1,9,-2,5.0
c,-6,2,1,0,-3.0


In [15]:
## Applying pop to a dataframe
green = df_rw_ratios.pop('green')
print(green)
print(df_rw_ratios)

a    1
b   -2
c    0
Name: green, dtype: int64
   red  white  blue  rw_ratio
a    4      5     3       0.8
b    5      1     9       5.0
c   -6      2     1      -3.0


##### Accessing Data in Pandas


### Importing and Exporting Data 
The following data used is from the Canadian Government's Open Data portal. The data is 



In [44]:
canadian_embassies_loc = '/Users/HudsonAccount/Desktop/embassies-consulates-list.json'
x = pd.read_json(canadian_embassies_loc)
y = x['data'].apply(pd.Series)
z = y['country'].apply(pd.Series)
a = y['offices'].apply(pd.Series)
z.set_index('eng')


  union = _union_indexes(indexes)


Unnamed: 0_level_0,country-id,country-iso,fra,offices-help-abroad,0
eng,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
{'name': 'Afghanistan'},1000,AF,{'name': 'Afghanistan'},"{'refer-to-office-id': '', 'eng': {'closing-te...",
{'name': 'Antigua and Barbuda'},10000,AG,{'name': 'Antigua-et-Barbuda'},"{'refer-to-office-id': 25000, 'eng': {'closing...",
{'name': 'Germany'},100000,DE,{'name': 'Allemagne'},"{'refer-to-office-id': '', 'eng': {'closing-te...",
{'name': 'Ghana'},101000,GH,{'name': 'Ghana'},"{'refer-to-office-id': '', 'eng': {'closing-te...",
{'name': 'Greece'},106000,GR,{'name': 'Grèce'},"{'refer-to-office-id': '', 'eng': {'closing-te...",
{'name': 'Greenland'},107000,GL,{'name': 'Groenland'},"{'refer-to-office-id': 73000, 'eng': {'closing...",
{'name': 'Grenada'},109000,GD,{'name': 'Grenade'},"{'refer-to-office-id': 25000, 'eng': {'closing...",
{'name': 'Argentina'},11000,AR,{'name': 'Argentine'},"{'refer-to-office-id': '', 'eng': {'closing-te...",
{'name': 'Guadeloupe'},110000,GP,{'name': 'Guadeloupe'},"{'refer-to-office-id': 25000, 'eng': {'closing...",
{'name': 'Guam'},111000,GU,{'name': 'Guam'},"{'refer-to-office-id': 186000, 'eng': {'closin...",


#### 