<div align="center">
<h1> Introduction to pandas</h1>
</div>


In the following sections you will learn:


   * How to import Pandas and create Pandas Series

### Downloading Pandas

Pandas is included with Anaconda. If you don't already have Anaconda installed on your computer, please refer to the Anaconda section to get clear instructions on how to install Anaconda on your PC or Mac.

### Pandas Versions

As with many Python packages, Pandas is updated from time to time. The following sections were created using Pandas version 0.22. You can check which version of Pandas you have by typing **!conda list pandas** in your Jupyter notebook or by typing **conda list pandas** in the Anaconda prompt. If you have another version of Pandas installed in your computer, you can update your version by typing **conda install pandas=0.22** in the Anaconda prompt. As newer versions of Pandas are released, some functions may become obsolete or replaced, so make sure you have the correct Pandas version before running the code. This will guarantee your code will run smoothly. 

In [2]:
!conda list pandas

# packages in environment at C:\Users\ziaeeamir\AppData\Local\Continuum\anaconda3:
#
# Name                    Version                   Build  Channel
pandas                    0.22.0           py36h6538335_0  


### Pandas Documentation

Pandas is remarkable data analysis library and it has many functions and features. In these introductory sections we will only scratch the surface of what Pandas can do. If you want to learn more about Pandas, make sure you check out the [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/)


### Why Use Pandas?

The recent success of machine learning algorithms is partly due to the huge amounts of data that we have available to train our algorithms on. However, when it comes to data, quantity is not the only thing that matters, the quality of your data is just as important. It often happens that large datasets don’t come ready to be fed into your learning algorithms. More often than not, large datasets will often have missing values, outliers, incorrect values, etc… Having data with a lot of missing or bad values, for example, is not going to allow your machine learning algorithms to perform well. Therefore, one very important step in machine learning is to look at your data first and make sure it is well suited for your training algorithm by doing some basic data analysis. This is where Pandas come in. Pandas Series and DataFrames are designed for fast data analysis and manipulation, as well as being flexible and easy to use. Below are just a few features that makes Pandas an excellent package for data analysis:

   * Allows the use of labels for rows and columns
   * Can calculate rolling statistics on time series data
   * Easy handling of NaN values
   * Is able to load data of different formats into DataFrames
   * Can join and merge different datasets together
   * It integrates with NumPy and Matplotlib

For these and other reasons, Pandas DataFrames have become one of the most commonly used Pandas object for data analysis in Python.


### Creating pandas Series

A Pandas series is a one-dimensional array-like object that can hold many data types, such as numbers or strings. One of the main differences between Pandas Series and NumPy ndarrays is that you can assign an index label to each element in the Pandas Series. In other words, you can name the indices of your Pandas Series anything you want. Another big difference between Pandas Series and NumPy ndarrays is that Pandas Series can hold data of different data types.

Let's start by importing Pandas into Python. It has become a convention to import Pandas as pd, therefore, you can import Pandas by typing the following command in your Jupyter notebook:

In [3]:
import pandas as pd

Let's begin by creating a Pandas Series. You can create Pandas Series by using the command **pd.Series(data, index)**, where index is a list of index labels. Let's use a Pandas Series to store a grocery list. We will use the food items as index labels and the quantity we need to buy of each item as our data.

In [9]:
# We import Pandas as pd into Python
import pandas as pd

# We create a Pandas Series that stores a grocery list
groceries = pd.Series(data=[30 , 6 , 'Yes' , 'No'], index = ['eggs', 'apples' ,'milk' , 'bread'])
# We display the Groceries Pandas Series
groceries

eggs       30
apples      6
milk      Yes
bread      No
dtype: object

We see that Pandas Series are displayed with the indices in the first column and the data in the second column. Notice that the data is not indexed 0 to 3 but rather it is indexed with the names of the food we put in, namely eggs, apples, etc... Also notice that the data in our Pandas Series has both integers and strings.

Just like NumPy ndarrays, Pandas Series have attributes that allows us to get information from the series in an easy way. Let's see some of them:

In [10]:
# We print some information about Groceries
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')

Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements


We can also print the index labels and the data of the Pandas Series separately. This is useful if you don't happen to know what the index labels of the Pandas Series are.

In [11]:
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)

The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')


If you are dealing with a very large Pandas Series and if you are not sure whether an index label exists, you can check by using the in command

In [12]:
# We check whether bananas is a food item (an index) in Groceries
x = 'bananas' in groceries

# We check whether bread is a food item (an index) in Groceries
y = 'bread' in groceries

# We print the results
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)

Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True
