# Numpy
Pandas is built on top of Numpy. As a result, pandas uses many of the same notations, conventions and follows tha same standards. So, having a good foundation with numpy is really very important to master pandas. Here are some resoucres to master numpy.

- [Python & Numpy Tutorial by Stanford](https://cs231n.github.io/python-numpy-tutorial/) - Its really great and covers things from ground up.
- [NUmpy Docs](https://www.numpy.org/devdocs/user/quickstart.html) - for people who wants to be legends
- [Machine Learning Plus - Numpy tutorial(part 1)](https://www.machinelearningplus.com/python/numpy-tutorial-part1-array-python-examples/) - Covers basics ideas one after the other. You will learn to do a lot of basic and frequently used things.
- [Machine Learning Plus - Numpy tutorial(part 2)](https://www.machinelearningplus.com/python/numpy-tutorial-python-part2/)
- [Machine Learning Plus - Numpy Exercise](https://www.machinelearningplus.com/python/101-numpy-exercises-python/)
- [More Numpy Exercise](https://pynative.com/python-numpy-exercise/)

Numpy is huge. Mastering everthing that numpy offers should never be the goal. Hence, I will advice all of you to spend sufficient time with numpy, just to make sure you feel confident about it. Lets get going with pandas :) 

# Pandas
Note : It is suggested that before starting this notebook you should check the ppt provided to you.

Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

There are two main data structures in pandas; **Series** and **DataFrame**. We will learn about them one by one. 

In [0]:
from google.colab import drive
import os
drive.mount('/content/drive')
os.chdir('drive/My Drive/courses/FML/2. Data Science/1. Data Analysis/2. Pandas')
!ls


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
'1. pandas_intro.ipynb'
'2. pandas series.ipynb'
'3. Pandas DataFrame.ipynb'
'4. Selecting Subsets with [ ], .loc and .iloc.ipynb'
'5. Boolean Indexing.ipynb'
'6. Assigning subsets of data.ipynb'
'7. Other Important concepts in Pandas.ipynb'
'8. Groupby.ipynb'
'capstone projects'
 data
 images
'pandas from numpy.pptx'
'Pandas Solutions(Part 4-6).ipynb'
'samples codes'


In [0]:
import pandas as pd # to work with pandas you will have to import it

## Pandas Series

#### From python list object

In [2]:
colors = ['Red', 'Blue', 'Green']
pd.Series(colors) # np.array()
# Object dtypes are string values

0      Red
1     Blue
2    Green
dtype: object

The **object** data type is the one data type that is unlike the others. A column that is of object data type may contain values that are of any valid Python object. Typically, when a column is of the object data type, it signals that the entire column is **strings**. This isn't necessarily the case as it is possible for these columns to contain a mixture of integers, booleans, strings, or other, even more complex Python objects such as lists or dictionaries. The object data type is a **catch-all for columns that pandas doesn’t recognize as any other specific type**.

In [3]:
numbers = [1, 2, 3]
pd.Series(numbers)

0    1
1    2
2    3
dtype: int64

#### From Dictionary

In [4]:
colors = {'Apple': 'Red',
          'Mango': 'Yellow',
          'Orange': 'Orange',
          'Grapes': 'Green'}
s = pd.Series(colors)
s

Apple        Red
Mango     Yellow
Orange    Orange
Grapes     Green
dtype: object

In [5]:
s.index

Index(['Apple', 'Mango', 'Orange', 'Grapes'], dtype='object')

In [6]:
s = pd.Series(['Red', 'Yellow', 'Orange', 'Green'], 
              index=['Apple', 'Mango', 'Orange', 'Grapes'])
s

Apple        Red
Mango     Yellow
Orange    Orange
Grapes     Green
dtype: object

In [7]:
s['Apple'],  s[0]

('Red', 'Red')

## Pandas DataFrame

In [0]:
p1 = pd.Series({'Name':'karan',
                'age': 25,
                 'gender':'male'})
p2 = pd.Series({'Name':'arjun',
                'age': 26,
                 'gender':'male'})
p3 = pd.Series({'Name':'kiran',
                'age': 22,
                 'gender':'female'})


In [9]:
p3

Name       kiran
age           22
gender    female
dtype: object

In [10]:
df = pd.DataFrame([p1,p2,p3],index=['person1', 'person2', 'person3'])
df

Unnamed: 0,Name,age,gender
person1,karan,25,male
person2,arjun,26,male
person3,kiran,22,female


In [0]:
import numpy as np

In [0]:
random_data = np.random.randint(10,100,25).reshape(5,5)

In [13]:
random_data

array([[43, 67, 40, 54, 30],
       [90, 99, 39, 97, 98],
       [14, 29, 42, 43, 16],
       [62, 83, 84, 10, 18],
       [73, 69, 91, 74, 87]])

In [14]:
list('ABCDE')

['A', 'B', 'C', 'D', 'E']

In [16]:
pd.DataFrame(random_data, index=['monday', 'tuesday', 'wednesday', 'thursday', 'friday'],columns=list('ABCDE'))

Unnamed: 0,A,B,C,D,E
monday,43,67,40,54,30
tuesday,90,99,39,97,98
wednesday,14,29,42,43,16
thursday,62,83,84,10,18
friday,73,69,91,74,87


## Reading dataset

In [0]:
clip = pd.read_clipboard() # reads the table from your clipboard

PyperclipException: ignored

In [0]:
clip

NameError: ignored

You can  see different type of files pandas can read in the cell below .

In [0]:
#uncomment the line below and press tab after 'pd.read_' 
#pd.read_

In [0]:
df = pd.read_csv("data/sample_data.csv", index_col=0) # reading a csv file

In [0]:
df

Unnamed: 0,Name,age,gender
person1,karan,25,male
person2,arjun,26,male
person3,kiran,22,female


Pandas is not just limited to reading csv's and clipboard. It can read data literally from any source and format. To learn more about it, you can refer [this](http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures). So, we will always be reading our data from some source or the other. 