# Pandas

**Pandas** is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

In [48]:
import pandas as pd

## Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

``` Python 
s = pd.Series(data, index=index)
```

In [49]:
fruits = pd.Series(["apple", "pear", "orange", "melon", "kiwi"], index=[1, 2, 3, 4, 5]) # Custom index optional
fruits

1     apple
2      pear
3    orange
4     melon
5      kiwi
dtype: object

In [50]:
fruits.apply(str.upper)

1     APPLE
2      PEAR
3    ORANGE
4     MELON
5      KIWI
dtype: object

In [65]:
import random

numbers = pd.Series([1, 2, 4, 8, 10, 11, 3, 1, 22, 141, 3, 5, 6, 3])
numbers * 2 + 1

0       3
1       5
2       9
3      17
4      21
5      23
6       7
7       3
8      45
9     283
10      7
11     11
12     13
13      7
dtype: int64

In [66]:
mask = numbers >= 5
type(mask)

pandas.core.series.Series

In [67]:
numbers[mask]

3       8
4      10
5      11
8      22
9     141
11      5
12      6
dtype: int64

## DataFrame
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

In [73]:
data = {
    "Name": ["Anders", "Tomas", "Mario"],
    "Age": [42, 46, 36]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Anders,42
1,Tomas,46
2,Mario,36


In [78]:
data = [
    {"Name": "Anders", "Age": 42},
    {"Name": "Tomas", "Age": 46},
    {"Name": "Mario", "Age": 36},
]

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Anders,42
1,Tomas,46
2,Mario,36


In [87]:
print(df["Age"])
print(type(df["Age"]))
print()
print(df["Name"])
print(type(df["Name"]))

0    42
1    46
2    36
Name: Age, dtype: int64
<class 'pandas.core.series.Series'>

0    Anders
1     Tomas
2     Mario
Name: Name, dtype: object
<class 'pandas.core.series.Series'>


In [103]:
df["Age"] = df["Age"] + 1
df # ökar med 1 för varje gång den körs

Unnamed: 0,Name,Age
0,Anders,55
1,Tomas,59
2,Mario,49


In [104]:
df["Age"] >= 50

0     True
1     True
2    False
Name: Age, dtype: bool

In [105]:
df[df["Age"] >= 50]

Unnamed: 0,Name,Age
0,Anders,55
1,Tomas,59


In [106]:
str(df)

'     Name  Age\n0  Anders   55\n1   Tomas   59\n2   Mario   49'