# Introduction Pandas

Pandas is a popular open-source data manipulation and analysis library for Python. It provides easy-to-use data structures and data analysis tools, making it a powerful tool for working with structured data.

**Data Structures:**

- Series: A one-dimensional labeled array capable of holding any data type.
- DataFrame: A two-dimensional labeled data structure with columns of potentially different data types. It is similar to a spreadsheet or SQL table.
- Panel: A three-dimensional labeled data structure, which can be thought of as a container for multiple DataFrames.

**Data Input/Output:**

- Reading and writing data from/to various file formats like CSV, Excel, SQL databases, etc.

**Data Manipulation:**

- Filtering, selecting, and transforming data.
- Handling missing values.
- Sorting and ranking data.
- Applying mathematical operations and functions to data.
- Merging, joining, and reshaping datasets.

**Data Analysis:**

- Aggregating and summarizing data.
- Grouping and pivoting data.
- Computing descriptive statistics.
- Applying advanced statistical functions.

# Working with Pandas

In [2]:
import pandas as pd
import numpy as np

## Pandas Series

### Creates a Series

In [4]:
g7_pop = pd.Series([35.467, 63.951, 80.940, 68.665, 127.061, 64.511, 318.523])
g7_pop

0     35.467
1     63.951
2     80.940
3     68.665
4    127.061
5     64.511
6    318.523
dtype: float64

### Changes the name

In [6]:
g7_pop.name = 'G7 Population in millions'
g7_pop

0     35.467
1     63.951
2     80.940
3     68.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

### Shows the type

In [7]:
g7_pop.dtype

dtype('float64')

In [8]:
g7_pop.values

array([ 35.467,  63.951,  80.94 ,  68.665, 127.061,  64.511, 318.523])

In [9]:
type(g7_pop.values)

numpy.ndarray

### Shows the elements

In [10]:
g7_pop[0]

35.467

In [11]:
g7_pop[1]

63.951

In [12]:
g7_pop.index

RangeIndex(start=0, stop=7, step=1)

### Changes the index

In [13]:
g7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States'
]

In [14]:
g7_pop

Canada             35.467
France             63.951
Germany            80.940
Italy              68.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

### Creates a series using a dictionary

In [15]:
pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy' : 60.665,
    'Japan' : 127.061,
    'United Kingdom' : 64.511,
    'United States' : 318.523
})

Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
dtype: float64

In [16]:
pd.Series(
    [ 35.467,  63.951,  80.94 ,  68.665, 127.061,  64.511, 318.523],
    index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States'
],
  name = 'G7 Population in millions'  
)

Canada             35.467
France             63.951
Germany            80.940
Italy              68.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

In [17]:
pd.Series(g7_pop, index=['France', 'Germany', 'Italy', 'Spain'])

France     63.951
Germany    80.940
Italy      68.665
Spain         NaN
Name: G7 Population in millions, dtype: float64

### Indexing

In [18]:
g7_pop['Canada']

35.467

In [19]:
g7_pop['Japan']

127.061

In [20]:
g7_pop.iloc[0]

35.467

In [21]:
g7_pop.iloc[-1]

318.523

In [23]:
g7_pop[['Italy', 'France']]

Italy     68.665
France    63.951
Name: G7 Population in millions, dtype: float64