# Pandas and Numpy:
- Actual usage of python is in data analytics.
- The name "Pandas" came from "Python Data Analytics".
- Pandas works on .csv which stands for "comma separated values".

## Pandas Series:
What is a Series?
- A Pandas Series is like a column in a table.
- It is a one-dimensional array holding data of any type.

## Series method in Pandas library. 

In [5]:
import pandas as pd

ser=[1,2,3]
var=pd.Series(ser)
print(var)

0    1
1    2
2    3
dtype: int64


### Create series of of any data type with any index 

In [6]:
ser=["Mon", "Tue", "Wed"]
var=pd.Series(ser, index=["day1", "day2", "day3"])
print(var["day2"])

Tue


### Create series from a dictionary by using DataFrame()

In [7]:
mydataset = {
    "cars": ["BMW", "Volvo", "Ford"],
    "passings": [3,7,2],
}

myvar=pd.DataFrame(mydataset)
print(myvar)

    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2


## Create Labels
With the index argument, you can name your own labels.

In [11]:
# Create your own labels:
import pandas as pd

a=[1,7,2]
myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

print("\nUsing labels:")
print(myvar[0])

x    1
y    7
z    2
dtype: int64

Using labels:
1


## Key/Value Objects as Series:
You can also use a key/value object, like a dictionary, when creating a Series.

In [18]:
# Create a simple Pandas Series from a dictionary
import pandas as pd

calories = {"day1": 420, "day2": 380, "day3":390}

var = pd.Series(calories)

print(var)

print("printing with labels:")
print(var["day1"])

day1    420
day2    380
day3    390
dtype: int64
printing with labels:
420


## DataFrames 
- Data sets in Pandas are usually multi-dimensional tables, called DataFrames.\
- Series is like a column, a DataFrame is the whole table.

In [13]:
# Create a DataFrame from two Series:
import pandas as pd

data = {
    "calories" : [420, 380, 390],
    "duration" : [50, 40, 45]
}

myvar = pd.DataFrame(data)
print(myvar)

   calories  duration
0       420        50
1       380        40
2       390        45


## Locate Row
- As you can see form the result above, the DataFrame is like a table with rows and columns.
- Pandas use the loc attribute to return one or more specified row(s).
- Note: When using [ ], the result is a Pandas DataFrame

In [25]:
print("Return row 0")
print(myvar.loc[0])
print("Return row 0 and 1")
print(myvar. loc[[0,1]])

Return row 0
calories    420
duration     50
Name: 0, dtype: int64
Return row 0 and 1
   calories  duration
0       420        50
1       380        40


## Read CSV
- A simple way to store big data sets is to use CSV files (comma separated files),
- CSV files contains plain text and is a well know format that can be read by everyone including Pandas.
- Note: if the .csv file is in different directory/folder, make sure you give the full path.
- Note: use to_string() to print the entire DataFrame.

In [29]:
# Load the CSV into a DataFrame
df = pd.read_csv("book1.csv") # will give FileNotFoundError if file does not exist
print(df.to_string())

   Book number             Book Name
0            1           Animal Farm
1            2                  1984
2            3        Fahrenheit 451
3            4         Atomic habits
4            5      Brave new world.
5            6   Alice in Wonderland
6            7       Sherlock Holmes
7            8          Harry Potter
8            9     Lord of the Rings
9           10  Song of Ice and Fire


## Printing DataFrame

In [30]:
print(df.head())

   Book number         Book Name
0            1       Animal Farm
1            2              1984
2            3    Fahrenheit 451
3            4     Atomic habits
4            5  Brave new world.


In [31]:
print(df.tail())

   Book number             Book Name
5            6   Alice in Wonderland
6            7       Sherlock Holmes
7            8          Harry Potter
8            9     Lord of the Rings
9           10  Song of Ice and Fire


## Using info() on DataFrame:

In [32]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Book number  10 non-null     int64 
 1   Book Name    10 non-null     object
dtypes: int64(1), object(1)
memory usage: 288.0+ bytes
None


## Printing large DataFrames:
- if DataFrame is too huge, first and last 10 rows are printed and in middle you get "..."

In [33]:
print(df) 

   Book number             Book Name
0            1           Animal Farm
1            2                  1984
2            3        Fahrenheit 451
3            4         Atomic habits
4            5      Brave new world.
5            6   Alice in Wonderland
6            7       Sherlock Holmes
7            8          Harry Potter
8            9     Lord of the Rings
9           10  Song of Ice and Fire
