# Pandas Library - Basics
#### Index
1. Series
2. DataFrame

In [1]:
import pandas as pd
import numpy as np

In [5]:
print(pd.__version__)

1.4.2


## Series
* A Pandas Series is like a column in a table.
* It is a one-dimensional array holding data of any type.

**Creating Series from Lists and Tuples**

In [2]:
a = pd.Series([35.46, 78.89, 34.23, 97.12, 15.78])
a

0    35.46
1    78.89
2    34.23
3    97.12
4    15.78
dtype: float64

**Key/Value Objects as Series**
* You can also use a key/value object, like a dictionary, when creating a Series.
* The keys of the dictionary become the labels.

In [13]:
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)

day1    420
day2    380
day3    390
dtype: int64


In [12]:
# Selecting specific objects
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories, index = ["day1", "day2"])
print(myvar)

day1    420
day2    380
dtype: int64


### Labels
* If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.
* This label can be used to access a specified value.

In [9]:
a = pd.Series([35.46, 78.89, 34.23, 97.12, 15.78])
print(a)
print(a[0])

0    35.46
1    78.89
2    34.23
3    97.12
4    15.78
dtype: float64
35.46


* **With the index argument, you can name your own labels**.
* When you have created labels, you can access an item by referring to the label.

In [10]:
# Giving manual index values
a.index = ["Brasil", 
          "Russia", 
          "India",
          "China",
          "SA"]
print(a)
print(a["Russia"])

Brasil    35.46
Russia    78.89
India     34.23
China     97.12
SA        15.78
dtype: float64
78.89


In [3]:
a.name = "G7 Population in millions"
a

0    35.46
1    78.89
2    34.23
3    97.12
4    15.78
Name: G7 Population in millions, dtype: float64

In [5]:
a.dtype

dtype('float64')

In [6]:
a.values

array([35.46, 78.89, 34.23, 97.12, 15.78])

In [7]:
type(a.values)

numpy.ndarray

In [8]:
a.name

'G7 Population in millions'

In [9]:
a.index

RangeIndex(start=0, stop=5, step=1)

Brasil    35.46
Russia    78.89
India     34.23
China     97.12
SA        15.78
Name: G7 Population in millions, dtype: float64

In [None]:
certificates_earned = pd.Series(
    [8, 2, 5, 6],
    index=['Tom', 'Kris', 'Ahmad', 'Beau']

### Accessing Elements

In [13]:
a = pd.Series([35.46, 78.89, 34.23, 97.12, 15.78], 
             index = ["Brasil", "Russia", "India", "China", "SA"], name = "Brics Nations GDP")
a

Brasil    35.46
Russia    78.89
India     34.23
China     97.12
SA        15.78
Name: Brics Nations GDP, dtype: float64

In [16]:
# accessing by index
print(a[2])

34.23


In [None]:
# accessing by index name
print(a["China"])

In [17]:
# with "iloc" attribute
print(a.iloc[4])

15.78


In [18]:
# multiple elements with index name
print(a[["India", "China"]])

India    34.23
China    97.12
Name: Brics Nations GDP, dtype: float64


In [19]:
# multiple elements with "iloc"
print(a.iloc[[0, 4]])

Brasil    35.46
SA        15.78
Name: Brics Nations GDP, dtype: float64


#### Conditional Selection

In [21]:
a = pd.Series([35.46, 78.89, 34.23, 97.12, 15.78], 
             index = ["Brasil", "Russia", "India", "China", "SA"], name = "Brics Nations GDP")
a

Brasil    35.46
Russia    78.89
India     34.23
China     97.12
SA        15.78
Name: Brics Nations GDP, dtype: float64

In [23]:
a[a > 50]

Russia    78.89
China     97.12
Name: Brics Nations GDP, dtype: float64

### Modifying Series

In [24]:
a = pd.Series([35.46, 78.89, 34.23, 97.12, 15.78], 
             index = ["Brasil", "Russia", "India", "China", "SA"], name = "Brics Nations GDP")
a

Brasil    35.46
Russia    78.89
India     34.23
China     97.12
SA        15.78
Name: Brics Nations GDP, dtype: float64

In [25]:
a["Brasil"] = 50
a

Brasil    50.00
Russia    78.89
India     34.23
China     97.12
SA        15.78
Name: Brics Nations GDP, dtype: float64

In [26]:
a[a < 50] = 50
a

Brasil    50.00
Russia    78.89
India     50.00
China     97.12
SA        50.00
Name: Brics Nations GDP, dtype: float64

#### Differences between loc and iloc
loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

## DataFrames
* A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

In [7]:
import pandas as pd

mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}
print(type(mydataset), mydataset)
df = pd.DataFrame(mydataset)
print(df)

<class 'dict'> {'cars': ['BMW', 'Volvo', 'Ford'], 'passings': [3, 7, 2]}
    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2


In [14]:
import pandas as pd
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)

   calories  duration
0       420        50
1       380        40
2       390        45


### Accessing Elements
#### By index value

In [16]:
import pandas as pd

mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}
df = pd.DataFrame(mydataset)
print(df)
print(df.loc[0])

    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2
cars        BMW
passings      3
Name: 0, dtype: object


In [17]:
print(df.loc[[0, 2]])

   cars  passings
0   BMW         3
2  Ford         2


#### By index name

In [19]:
import pandas as pd
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

      calories  duration
day1       420        50
day2       380        40
day3       390        45


In [20]:
print(df.loc["day2"])

calories    380
duration     40
Name: day2, dtype: int64
