## Pandas

Pandas is a python library used to analyze, clean, explore and manipulate data.

In [2]:
import pandas as pd

mydataset = {
    "cars": ["BMW", "Honda", "Audi"], 
    "ratings": ["8", "3", "6"]
}

x = pd.DataFrame(mydataset)
print(x)

    cars ratings
0    BMW       8
1  Honda       3
2   Audi       6


### Series

A Pandas Series is like a column in a table (one-dimensional array).

In [6]:
a = [3, 5, 2]
var = pd.Series(a)
print(var)

# Using our own labels for each entry
var = pd.Series(a, index=["x", "y", "z"])
print(var)

# Using dictionary
a = {"a": 3, "b": 7, "c": 9}
print(pd.Series(a))

0    3
1    5
2    2
dtype: int64
x    3
y    5
z    2
dtype: int64
a    3
b    7
c    9
dtype: int64


### DataFrames

Datasets in pandas are usually multi-dimensional tables, called DataFrames.

`loc` attribute is used to locate one or more rows. We can also change the labels using the `index` attribute while constructing the dataframe.

In [18]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

x = pd.DataFrame(data, index=[1, 2, 3])
print(x)

df = pd.DataFrame(data, index=["day1", "day2", "day3"])
print(df,"\n", df.loc[["day2"]])

   calories  duration
1       420        50
2       380        40
3       390        45
      calories  duration
day1       420        50
day2       380        40
day3       390        45 
       calories  duration
day2       380        40


### CSV

In [26]:
df = pd.read_csv('bike.csv')
# print(df.to_string())
print(df)

# Changing the maximum number of rows displayed
print(pd.options.display.max_rows)
pd.options.display.max_rows = 999
print(df)

     Gender  Age  Salary  Purchase Duke
0      Male   19   19000              0
1      Male   35   20000              0
2    Female   26   43000              0
3    Female   27   57000              0
4      Male   19   76000              0
5      Male   27   58000              0
6    Female   27   84000              0
7    Female   32  150000              1
8      Male   25   33000              0
9    Female   35   65000              0
10   Female   26   80000              0
11   Female   26   52000              0
12     Male   20   86000              0
13     Male   32   18000              0
14     Male   18   82000              0
15     Male   29   80000              0
16     Male   47   25000              1
17     Male   45   26000              1
18     Male   46   28000              1
19   Female   48   29000              1
20     Male   45   22000              1
21   Female   47   49000              1
22     Male   48   41000              1
23   Female   45   22000              1


### Analyzing Data

- `head()` returns the headers and a specified number of rows from the top (default, 5 rows).
- `tail()` returns the headers and a specified number of rows from the bottom

In [31]:
a = pd.read_csv('data.csv')
print(a.head(10))
print(a.tail(7))

   Duration  Pulse  Maxpulse  Calories
0        60    110       130     409.1
1        60    117       145     479.0
2        60    103       135     340.0
3        45    109       175     282.4
4        45    117       148     406.0
5        60    102       127     300.0
6        60    110       136     374.0
7        45    104       134     253.3
8        30    109       133     195.1
9        60     98       124     269.0
     Duration  Pulse  Maxpulse  Calories
162        45     95       130     270.0
163        45    100       140     280.9
164        60    105       140     290.8
165        60    110       145     300.0
166        60    115       145     310.2
167        75    120       150     320.4
168        75    125       150     330.4
