# How to Import Data in Python

## Learning Objectives
One of the reasons why Python is such a popular programming language for machine learning is because it supports some very powerful and easy to use packages, which are purpose built for data analysis. One of these packages is a pandas package. The **pandas** package provides several easy to use functions for creating, structuring, and importing data.

+ what a pandas Series is
+ what a pandas DataFrame is
+ how to import data from a CSV file
+ how to import data from an Excel file

Before we can use any of these functions, we first have to import the **pandas** package using the **import** command. Here, the import command imports the **pandas** package, and we use an alias for the package. We call it **pd**. This allows us to refer to the functions of the package by simply referring to **pd dot a function name**.

In [5]:
import pandas as pd

One of the ways the pandas represents data is as a series. A panda series is heterogeneous one dimensional array-like data structure with labeled rows. We can create a panda series from a previously created list.

In [6]:
members = ["Brazil", "Russia", "India", "China", "South Africa"]

Given the members list, we can create a series object as follows. We're going to create a series object called bricks1, and we create the series object by calling the pd series, construct a function, and we pass the members list to the series. As you can see, the series object is made up of a set of indexes on the left and values on the right.

In [7]:
brics1 = pd.Series(members)
brics1

0          Brazil
1          Russia
2           India
3           China
4    South Africa
dtype: object

To verify that bricks1 is a panda series, let's pass it to the type function to see what we get.

In [8]:
type(brics1)

pandas.core.series.Series

Another way that pandas represents data is as a data frame. A pandas data frame is a heterogeneous two dimensional data structure with labeled rows and columns. We can think of a pandas data frame as a collection of several panda series, all sharing the same index. A data frame is very similar to a spreadsheet or a relational database table.

We can create a pandas data frame from a previously created dictionary. Given the members dictionary, we can create a data frame object as follows.

In [10]:
members = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
        "capital": ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"],
        "gdp": [2750, 1658, 3202, 15270, 370],
        "literacy":[.944, .997, .721, .964, .943],
        "expectancy": [76.8, 72.7, 68.8, 76.4, 63.6],
        "population": [210.87, 143.96, 1367.09, 1415.05, 57.4]}

Here, we are going to create bricks2, and bricks2 is created by calling the data frame, construct a function, and we passed with the members dictionary. As you can see, pandas converted the dictionary keys to column names, and it used the values for each dictionary key as the cell values in the data frame.

In [11]:
brics2 = pd.DataFrame(members)
brics2

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


To verify that bricks2 is a data frame, let's call the type function to see what it returns. There we have it. It is a data frame.

In [12]:
type(brics2)

pandas.core.frame.DataFrame

We can also create a data frame from a previously created two dimensional list of values and a list of column names. Given the members and labels lists, we can create a data frame object as follows.

In [13]:
members = [["Brazil", "Brasilia", 2750, 0.944, 76.8, 210.87],
                     ["Russia", "Moscow", 1658, 0.997, 72.7, 143.96],
                     ["India", "New Delhi", 3202, 0.721, 68.8, 1367.09],
                     ["China", "Beijing", 15270, 0.964, 76.4, 1415.05],
                     ["South Africa", "Pretoria", 370, 0.943, 63.6, 57.4]]
labels = ["country", "capital", "gdp", "literacy", "expectancy", "population"]

This time, we create breaks3, we passed the data frame construct a function, the members list, as well as the labels lists, as the column names.

In [14]:
brics3 = pd.DataFrame(members, columns = labels)
brics3

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


Another way to create a pandas data frame, is by importing data directly from an external source. For example, we can create a data frame by importing a CSV file.

So, let us create another that a frame, bricks4. This time, we use a pd.read_csv function, and we pass through it the file we want it to read.

In [15]:
brics4 = pd.read_csv("brics02.csv")
brics4

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


We can also create a pandas data frame by importing a Microsoft Excel file. This time we're going to call it bricks5, and we will use the read_excel function, and we pass to it the name of the file we intend to read.

In this example, we read from an Excel file. Note that for multi-sheet Excel files, the pandas read Excel function imports the first sheet by default.

In [16]:
brics5 = pd.read_excel("brics02.xlsx")
brics5

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


If we want to import a sheet other than the first one, we have to specify a value for the sheet name argument within the read XL function.

For example, the bricks Excel file we just imported, has two sheets. The first is named members and the second is named summits. When we imported the file, the function imported the first sheet, which is the member sheet. To import the second sheet, which is the summits sheet, we make the following modification to our code.

In [17]:
brics6 = pd.read_excel("brics02.xlsx", sheet_name = "Summits")
brics6

Unnamed: 0,summit,date,host,leader,location
0,1st,"June 16th, 2009",Russia,Dmitry Medvedev,Yekaterinburg (Sevastianov's House)
1,2nd,"April 15th, 2010",Brazil,Luiz Inácio Lula da Silva,Brasília (Itamaraty Palace)
2,3rd,"April 14th, 2011",China,Hu Jintao,Sanya (Sheraton Sanya Resort)
3,4th,"March 29th, 2012",India,Manmohan Singh,New Delhi (Taj Mahal Hotel)
4,5th,"March 26th – 27th, 2013",South Africa,Jacob Zuma,Durban (Durban ICC)
5,6th,"July 14th – 17th, 2014",Brazil,Dilma Rousseff,Fortaleza (Centro de Eventos do Ceará)
6,7th,"July 8th – 9th, 2015",Russia,Vladimir Putin,Ufa (Congress Hall)
7,8th,"October 15th – 16th, 2016",India,Narendra Modi,Benaulim (Taj Exotica)
8,9th,"September 3th – 5th, 2017",China,Xi Jinping,Xiamen (Xiamen International Conference Center)
9,10th,"July 25th – 27th, 2018",South Africa,Cyril Ramaphosa,Johannesburg (Sandton Convention Centre)


Besides CSV and Excel files, the pandas package allows us to import other file types, which we do not cover or go over in this tutorial. To get an exhaustive list of supported file types, visit the pandas documentation website.