# Pandas
https://pandas.pydata.org/
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language and numpy (so no need to loop through the objects). 


In [1]:
import pandas as pd


## Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). Usually a column in a DataFrame.

In [10]:
fruits = pd.Series(["apple", "pear", "orange", "melon", "kiwi"], index=[5,5,"sju",8,9])
print(fruits)

print(fruits["sju"])
# Index out of range gives KeyError, treats it kind of like a dict.
# Index keys are not necessarily unique.

5       apple
5        pear
sju    orange
8       melon
9        kiwi
dtype: object
orange


In [5]:
fruits.apply(str.upper)

0     APPLE
1      PEAR
2    ORANGE
3     MELON
4      KIWI
dtype: object

In [4]:
numbers = pd.Series([5,6,3])
numbers * 2

0    10
1    12
2     6
dtype: int64

In [12]:
numbers >= 5

0     True
1     True
2    False
dtype: bool

In [13]:
numbers[numbers >= 5]

0    5
1    6
dtype: int64

### Data types
| **Pandas dtype** | **Python type** | **NumPy type** | **Usage** |
|------------------|-----------------|----------------|-----------|
| object           | str or mixed    | string_, unicode_, mixed types | Text or mixed numeric and non-numeric values |
| int64            | int             | int_, int8, int16, int32, int64, uint8, uint16, uint32, uint64 | Integer numbers |
| float64          | float           | float_, float16, float32, float64 | Floating point numbers |
| bool             | bool            | bool_          | True/False values |
| datetime64       | NA              | datetime64[ns] | Date and time values |
| timedelta[ns]    | NA              | NA             | Differences between two datetimes |


In [15]:
fruits[True,True,False,False,True]

KeyError: 'key of type tuple not found and not a MultiIndex'


## DataFrame
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

In [18]:
data = {
    "Name": ["Anders", "Tomas", "Mario"],
    "Age": [42,46,36]
}

data

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Anders,42
1,Tomas,46
2,Mario,36


In [22]:
data = [
    {"Name": "Anders", "Age": 42},
    {"Name": "Tomas", "Age": 46},
    {"Name": "Mario", "Age": 36}
]

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Anders,42
1,Tomas,46
2,Mario,36


In [23]:
# Filtering a column in a DataFrame yields a Series 
df["Name"]

0    Anders
1     Tomas
2     Mario
Name: Name, dtype: object

In [29]:
df["Age"] = df["Age"] + 1
df

Unnamed: 0,Name,Age
0,Anders,47
1,Tomas,51
2,Mario,41


In [30]:
df["Age"] >= 50

0    False
1     True
2    False
Name: Age, dtype: bool

In [31]:
df[df["Age"] >=50]

Unnamed: 0,Name,Age
1,Tomas,51


In [33]:
print(str(df))

     Name  Age
0  Anders   47
1   Tomas   51
2   Mario   41
