# Pandas intro

Pandas consists of two main data structures:
- Series
- DataFrame

## Series

In [1]:
import pandas as pd

In [2]:
names = pd.Series(['Peter', 'John', 'Tom', 'Melisa', 'Ann'])
names

0     Peter
1      John
2       Tom
3    Melisa
4       Ann
dtype: object

In [3]:
numbers = pd.Series([10, -1, 123, -524, 1024])
numbers

0      10
1      -1
2     123
3    -524
4    1024
dtype: int64

In [4]:
countries = ["Poland", "Germany", "Russia", "USA", "UK"]
population = [38_123_543, 82_465_123, 150_872_112, 320_465_123, 62_649_122]
s = pd.Series(population, countries)
s

Poland      38123543
Germany     82465123
Russia     150872112
USA        320465123
UK          62649122
dtype: int64

What we can do on a Series object?
- use build-in methods like mean()
- fetch elements from the Series using index
- filter elements

In [5]:
s.mean()

130915004.6

In [6]:
s[s > 100_000_000]

Russia    150872112
USA       320465123
dtype: int64

## DataFrame

DataFrame is a data structure that is build with Series. We are combining several Series together to have one DataFrame. The index is shared between all Series.

In [8]:
people = pd.DataFrame([
    ['John', 20, 'M'],
    ['Ann', 25, 'F'],
    ['Adam', 30, 'M'],
    ['Melisa', 35, 'F']
], columns=['Name', 'Age', 'Gender'])
people

Unnamed: 0,Name,Age,Gender
0,John,20,M
1,Ann,25,F
2,Adam,30,M
3,Melisa,35,F


In [9]:
people.columns

Index(['Name', 'Age', 'Gender'], dtype='object')

In [10]:
people.dtypes

Name      object
Age        int64
Gender    object
dtype: object

In [11]:
people.index

RangeIndex(start=0, stop=4, step=1)

In [12]:
people.size

12

In [13]:
people = pd.DataFrame([
    ['John', 20, 'M'],
    ['Ann', 25, 'F'],
    ['Adam', 30, 'M'],
    ['Melisa', 35, 'F']
], 
    columns=['Name', 'Age', 'Gender'],
    index=[110, 120, 130, 140]
)
people

Unnamed: 0,Name,Age,Gender
110,John,20,M
120,Ann,25,F
130,Adam,30,M
140,Melisa,35,F


In [14]:
people.index

Int64Index([110, 120, 130, 140], dtype='int64')

In [16]:
# we can provide data by columns / series
people = pd.DataFrame({
    'Name': ['John', 'Ann', 'Adam', 'Melisa'],
    'Age': [20, 25, 30, 35],
    'Gender': ['M', 'F', 'M', 'F'],
}, 
    index=[110, 120, 130, 140]
)
people

Unnamed: 0,Name,Age,Gender
110,John,20,M
120,Ann,25,F
130,Adam,30,M
140,Melisa,35,F
