# Pandas, an introduction

**Learning Objectives:**
1. Gain an introduction to *DataFrames*, which are a variable used to store tables within the *pandas* library. 
2. Understand the following:
    1. Modules
    3. Objects
    5. DataFrames

[*pandas*](http://pandas.pydata.org/) is an open source module for Python. A module is a set of functions which can be imported into Python. *pandas* contains many functions and tools for data analysis and data handling. 

This course will provide you with a basic introduction to *pandas*. To take this course further or to learn more, the [*pandas* documentation](https://pandas.pydata.org/pandas-docs/stable/index.html) contains many tutorials and extensive documentation.

###### What is a module?

A library for Python is a collection of extra tools and functions you can use in Python, it can be thought as a plug-in that adds an extension to the functionality of Python.

Modules need to be imported into Python. To import a module, use `import` followed by the module name (*e.g.* to import *pandas*, `import pandas`, should be used). 

###### What is the Pandas library used for?

As said previously, the *pandas* module specialises in data analysis and data handling. 

It is one of the industry standards in data analysis / manipulation and *pandas* is very similar to another programming language called *R*. *pandas* is used extensively in finanical applications which are very relavent to Economics, Business and social science students. 

###### How to import a module:

Below is a piece of code that imports the *pandas* module. We must include this line of code if we want to use the functions from the library.

In [None]:
import pandas # This line imports the pandas library
print("We have now successfully imported the pandas Library")

We can now use all the functions of the library now that it has been imported. However, to call a function we need to use the following syntax ```pandas.FUNCTION```.

In the code below we have an example that produces a series. Don't worry if you don't understand what is going on, the key thing is to grasp the way we call functions from the pandas library

In [None]:
st_series = pandas.Series([1997,1980,1983,1999,2002,2005])
#See how we have had to use pandas.Series() to call the function, not just Series()
st_series

To save us from having to write down `pandas.FUNCTION` everytime we can give the pandas library a nickname when we import it. Notice how we can now use `pd.Function` as we have given pandas the nickname of `pd`. 

In [None]:
import pandas as pd
MySeries = pd.Series(["I","like","Python"])
MySeries

###### What is a `.csv` file?

As *pandas* is used for data analysis we need a way to import data into our program so we can use it. 

A .csv file is a series of strings seperated by commas, known as **c**omma-**s**eperated **v**alues (CSV, hence `.csv` file suffix). Each string seperated by commas represents a single cell in a spreadsheet. CSV files can be opened in a spreadsheet applications such as *Microsoft Excel* or even simple text editors such as *Notepad*.

Typically, the first line in the `.csv` file contains the header infomation of the data. A `.csv` may look like the following if you were to open it in a notepad application.

This file represents the table below:

| Name                    | Year Released |  Rating |
|-------------------------|---------------|---------|
| A New Hope              | 1977          | 8/10    |
| The Empire Strikes Back | 1980          | 9/10    |
| Return of the Jedi      | 1983          | 8/10    |
| The Phantom Menace      | 1999          | 6/10    |
| Attack fo the Clones    | 2002          | 6/10    |
| Revenge of the Sith     | 2005          | 7/10    |


###### What is a DataFrame?

When we import tabular data into pandas, it stores it as in an object know as a DataFrame. An object is a collection of variables. These variables include infomation on the data, headers and indexes.

To output the DataFrame we just need to call the objects name. 

Below is some code which 'hard-codes' data into the `starwar_df` object. You do not need to worry how this code works.

In [None]:
starwars_data = [
    ["A New Hope",1977,"8/10"],
    ["The Empire Strikes Back",1980,"9/10"],
    ["Return of the Jedi",1983,"8/10"],
    ["The Phantom Menace",1999,"6/10"],
    ["Attack fo the Clones",2002,"6/10"],
    ["Revenge of the Sith",2005,"7/10"],
]
#Above is the hard-coded data


starwars_df = pd.DataFrame(starwars_data,columns=[
        "Name",
        "Year Released",
        "Rating"
    ]
)

To output the data from the `starwars_df` DataFrame all we need to do is call its name. See below.

In [None]:
starwars_df