___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:120%; text-align:center; border-radius:10px 10px;">Way to Reinvent Yourself</p>


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:center; border-radius:10px 10px;">Practical Tutorial on Data Manipulation with Numpy and Pandas in Python</p>

<img src=https://i.ibb.co/8NdjfdZ/Num-Py-logo.png width="700" height="200">





## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">What is Numpy? & Why Do We Need It?</p>




## What is NumPy? 

NumPy (or Numpy) is a Linear Algebra Library for Python, the reason it is so important for Data Science with Python is that almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks.

NumPy or Numeric Python is a package for computation on **homogenous n-dimensional arrays**. In numpy dimensions are called as axes.

**Why do we need NumPy?**

A question arises that why do we need NumPy when python lists are already there. The answer to it is we cannot perform operations on all the elements of two list directly. For example, we cannot multiply two lists directly we will have to do it element-wise. This is where the role of NumPy comes into play.

Numpy is also incredibly fast, as it has bindings to C libraries. For more info on why you would want to use Arrays instead of lists, check out this great [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).

### Numpy is/has;

**POWERFUL N-DIMENSIONAL ARRAYS**<br>
Fast and versatile, the NumPy vectorization, indexing, and broadcasting concepts are the de-facto standards of array computing today.<br>
**NUMERICAL COMPUTING TOOLS**<br>
NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more.<br>
**INTEROPERABLE**<br>
NumPy supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries.<br>
**PERFORMANT**<br>
The core of NumPy is well-optimized C code. Enjoy the flexibility of Python with the speed of compiled code.<br>
**EASY TO USE**<br>
NumPy’s high level syntax makes it accessible and productive for programmers from any background or experience level.<br>
**OPEN SOURCE**<br>
Distributed under a liberal BSD license, NumPy is developed and maintained publicly on GitHub by a vibrant, responsive, and diverse community. [Source](https://numpy.org/)<br>


## Why do we need it?
To make a logical and mathematical computation on array and matrices, it is needed. It performs these operations way too efficiently and faster than python lists.

## Advantages of NumPy
1. Numpy arrays take less space.
The core of Numpy is its arrays. One of the main advantages of using Numpy arrays is that they take less memory space and provide better runtime speed when compared with similar data structures in python(lists and tuples).NumPy’s arrays are smaller in size than Python lists. A python list could take upto 20MB size while an array could take 4MB. Arrays are also easy to access for reading and writing.
2. The speed performance is also great. It performs faster computations than python lists.
3. Numpy support some specific scientific functions such as linear algebra. They help us in solving linear equations.
4. Numpy support vectorized operations, like elementwise addition and multiplication, computing Kronecker product, etc. Python lists fail to support these features.
5. It is a very good substitute for MATLAB, OCTAVE, etc as it provides similar functionalities and supports with faster development and less mental overhead(as python is easy to write and comprehend)
6. As it is open-source, it doesn’t cost anything, and it uses a very popular programming language, Python, which has high-quality libraries for almost every task. Also, it is easy to connect the existing C code to the Python interpreter.
7. NumPy is very good for data analysis.

## Disadvantages of NumPy

1. Using “nan” in Numpy: “Nan” stands for “not a number”. It was designed to address the problem of missing values. NumPy itself supports “nan” but lack of cross-platform support within Python makes it difficult for the user. That’s why we may face problems when comparing values within the Python interpreter.
2. Require a contiguous allocation of memory: Insertion and deletion operations become costly as data is stored in contiguous memory locations as shifting it requires shifting.

![Capture1.PNG](https://i.ibb.co/FY1q7Xh/uses-of-numpy.png)

[Numpy Source 01](https://www.educba.com/what-is-numpy-in-python/),
[Numpy Source 02](https://www.javatpoint.com/numpy-tutorial),
[Numpy Source 03](https://techvidvan.com/tutorials/python-numpy-tutorial/),
[Numpy Source 04](https://medium.com/analytics-vidhya/introduction-to-numpy-16a6efaffdd7),
[Numpy Source 05](https://data-flair.training/blogs/python-numpy-tutorial/),
[Numpy Source 06](https://www.quora.com/In-Python-what-is-NumPy-How-is-it-used),
[Numpy Source 07](https://fgnt.github.io/python_crashkurs_doc/include/numpy.html),
[Numpy Source 08](https://towardsdatascience.com/a-hitchhiker-guide-to-python-numpy-arrays-9358de570121),
[Numpy Source 09](https://scipy-lectures.org/intro/numpy/array_object.html),
[Numpy Source 09](https://www.educba.com/introduction-to-numpy/)

We will only learn the basics of NumPy, to get started we need to install it!

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:left; border-radius:10px 10px;">Starting with Numpy</p>

## 1) Load the library and check its version, just to make sure we aren't using an older version

In [1]:
import numpy as np
import pandas as pd

np.__version__
pd.__version__

'1.3.5'

## 2) Create a list comprising numbers from 0 to 9

In [8]:
L = list(range(10))

## 3) Converting integers to string - this style of handling lists is known as list comprehension.
### List comprehension offers a versatile way to handle list manipulations tasks easily.

In [11]:
[type(item) for item in L]
 # [int, int, int, int, int, int, int, int, int, int]

[int, int, int, int, int, int, int, int, int, int]

In [12]:
L = [str(c) for c in L]
# ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
L

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [13]:
[type(item) for item in L]

[str, str, str, str, str, str, str, str, str, str]

## Creating Arrays

#### Numpy arrays are homogeneous in nature, i.e., they comprise one data type (integer, float, double, etc.) unlike lists.

In [14]:
#creating arrays
np.zeros(10, dtype='int')


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])


#creating a 3 row x 5 column matrix
np.ones((3,5), dtype=float)


array([[ 1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.],
      [ 1.,  1.,  1.,  1.,  1.]])


#creating a matrix with a predefined value
np.full((3,5),1.23)


array([[ 1.23,  1.23,  1.23,  1.23,  1.23],
      [ 1.23,  1.23,  1.23,  1.23,  1.23],
      [ 1.23,  1.23,  1.23,  1.23,  1.23]])


#create an array with a set sequence
np.arange(0, 20, 2)


array([0, 2, 4, 6, 8,10,12,14,16,18])


#create an array of even space between the given range of values
np.linspace(0, 1, 5)
array([ 0., 0.25, 0.5 , 0.75, 1.])


#create a 3x3 array with mean 0 and standard deviation 1 in a given dimension
np.random.normal(0, 1, (3,3))
array([[ 0.72432142, -0.90024075,  0.27363808],
      [ 0.88426129,  1.45096856, -1.03547109],
      [-0.42930994, -1.02284441, -1.59753603]])


#create an identity matrix
np.eye(3)


array([[ 1.,  0.,  0.],
      [ 0.,  1.,  0.],
      [ 0.,  0.,  1.]])


#set a random seed
np.random.seed(0)


x1 = np.random.randint(10, size=6) #one dimension
x2 = np.random.randint(10, size=(3,4)) #two dimension
x3 = np.random.randint(10, size=(3,4,5)) #three dimension


print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
('x3 ndim:', 3)
('x3 shape:', (3, 4, 5))
('x3 size: ', 60)

NameError: ignored

## Array Indexing

#### The important thing to remember is that indexing in python starts at zero.

In [None]:
x1 = np.array([4, 3, 4, 4, 8, 4])

In [None]:
x1 = np.array([4, 3, 4, 4, 8, 4])
x1

array([4, 3, 4, 4, 8, 4])

#assess value to index zero
x1[0]
4

#assess fifth value
x1[4]
8

#get the last value
x1[-1]
4

#get the second last value
x1[-2]
8

#in a multidimensional array, we need to specify row and column index
x2
array([[3, 7, 5, 5],
      [0, 1, 5, 9],
      [3, 0, 5, 0]])


#1st row and 2nd column value
x2[2,3]
0

#3rd row and last value from the 3rd column
x2[2,-1]
0


#replace value at 0,0 index
x2[0,0] = 12
x2
array([[12,  7,  5,  5],
      [ 0,  1,  5,  9],
      [ 3,  0,  5,  0]])

## Array Slicing


Now, we'll learn to access multiple or a range of elements from an array.

In [None]:
x = np.arange(10)

In [None]:
x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


#from start to 4th position
x[:5]
array([0, 1, 2, 3, 4])


#from 4th position to end
x[4:]
array([4, 5, 6, 7, 8, 9])


#from 4th to 6th position
x[4:7]
array([4, 5, 6])


#return elements at even place
x[ : : 2]
array([0, 2, 4, 6, 8])


#return elements from first position step by two
x[1::2]
array([1, 3, 5, 7, 9])


#reverse the array
x[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

## Array Concatenation


Many a time, we are required to combine different arrays. So, instead of typing each of their elements manually, you can use array concatenation to handle such tasks easily.

In [16]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21]
np.concatenate([x, y,z])

array([ 1,  2,  3,  3,  2,  1, 21, 21])

In [None]:
#You can concatenate two or more arrays at once.
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]
np.concatenate([x, y,z])


array([ 1,  2,  3,  3,  2,  1, 21, 21, 21])


#You can also use this function to create 2-dimensional arrays.
grid = np.array([[1,2,3],[4,5,6]])
np.concatenate([grid,grid])


array([[1, 2, 3],
      [4, 5, 6],
      [1, 2, 3],
      [4, 5, 6]])


#Using its axis parameter, you can define row-wise or column-wise matrix
np.concatenate([grid,grid],axis=1)
array([[1, 2, 3, 1, 2, 3],
      [4, 5, 6, 4, 5, 6]])

In [18]:
grid = np.array([[1,2,3],[4,5,6]])
 #np.concatenate([grid,grid])
np.concatenate([grid,grid],axis=0)

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

Until now, we used the concatenation function of arrays of equal dimension. But, what if you are required to combine a 2D array with 1D array? In such situations, np.concatenate might not be the best option to use. Instead, you can use np.vstack or np.hstack to do the task. Let's see how!

In [None]:
x = np.array([3,4,5])
grid = np.array([[1,2,3],[17,18,19]])
np.vstack([x,grid])
array([[ 3,  4,  5],
      [ 1,  2,  3],
      [17, 18, 19]])


#Similarly, you can add an array using np.hstack
z = np.array([[9],[9]])
np.hstack([grid,z])
array([[ 1,  2,  3,  9],
      [17, 18, 19,  9]])


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:left; border-radius:10px 10px;">Let's start with Pandas</p>

![pandas.png](attachment:pandas.png)

In [19]:
#create a data frame - dictionary is used here where keys get converted to column names and values to row values.
data = pd.DataFrame({'Country': ['Russia','Colombia','Chile','Equador','Nigeria'],
                    'Rank':[121,40,100,130,11]})
data

Unnamed: 0,Country,Rank
0,Russia,121
1,Colombia,40
2,Chile,100
3,Equador,130
4,Nigeria,11


In [20]:
#We can do a quick analysis of any data set using:

In [21]:
data.describe()

Unnamed: 0,Rank
count,5.0
mean,80.4
std,52.300096
min,11.0
25%,40.0
50%,100.0
75%,121.0
max,130.0


Remember, describe() method computes summary statistics of integer / double variables. To get the complete information about the data set, we can use info() function.

In [None]:
#Among other things, it shows the data set has 5 rows and 2 columns with their respective names.

In [22]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Country  5 non-null      object
 1   Rank     5 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 208.0+ bytes


In [None]:
#Let's sort the data frame by Rank - inplace = True will make changes to the data
data.sort_values(by=['Rank'],ascending=True,inplace=False)

Unnamed: 0,Country,Rank
4,Nigeria,11
1,Colombia,40
2,Chile,100
0,Russia,121
3,Equador,130


We can sort the data by not just one column but multiple columns as well.

In [23]:
data.sort_values(by=['Country','Rank'],ascending=[True,True],inplace=False)

Unnamed: 0,Country,Rank
2,Chile,100
1,Colombia,40
3,Equador,130
4,Nigeria,11
0,Russia,121


![pandas%20series.png](attachment:pandas%20series.png)

In [None]:
#Series function from pandas are used to create arrays
data = pd.Series([1., -999., 2., -999., -1000., 3.])
data

0       1.0
1    -999.0
2       2.0
3    -999.0
4   -1000.0
5       3.0
dtype: float64

![pandas%20series2.png](attachment:pandas%20series2.png)

In [24]:
#replace -999 with NaN values
data.replace(-999, np.nan, inplace=True)
data


#We can also replace multiple values at once.
data = pd.Series([1., -999., 2., -999., -1000., 3.])
data.replace([-999,-1000],np.nan,inplace=True)
data

0    1.0
1    NaN
2    2.0
3    NaN
4    NaN
5    3.0
dtype: float64

Now, let's learn how to rename column names and axis (row names).

In [None]:
data = pd.DataFrame(np.arange(12).reshape((3, 4)),index=['Ohio', 'Colorado', 'New York'],columns=['one', 'two', 'three', 'four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
New York,8,9,10,11


In [None]:
# Rename column and row names as in the picture
data.rename(index = {'Ohio':'SanF'}, columns={'one':'one_p','two':'two_p'},inplace=True)
data

Unnamed: 0,one_p,two_p,three,four
SanF,0,1,2,3
Colorado,4,5,6,7
New York,8,9,10,11


Let's proceed and learn about grouping data and creating pivots in pandas. It's an immensely important data analysis method which you'd probably have to use on every data set you work with.

In [29]:
df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],
                   'key2' : ['one', 'two', 'one', 'two', 'one'],
                   'data1' : np.random.randn(5),
                   'data2' : np.random.randn(5)})
df

Unnamed: 0,key1,key2,data1,data2
0,a,one,0.171684,-0.514774
1,a,two,-0.076587,-0.717259
2,b,one,0.924721,2.971275
3,b,two,-0.627941,-1.2468
4,a,one,2.039414,-1.893856


In [30]:
#calculate the mean of data1 column by key1

In [31]:
grouped = df['data1'].groupby(df['key1'])
grouped.mean()

key1
a    0.711504
b    0.148390
Name: data1, dtype: float64

Now, let's see how to slice the data frame.

In [32]:
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.264467,2.097939,0.420385,-1.232054
2013-01-02,0.230363,0.377667,0.469629,2.07671
2013-01-03,-0.801208,-0.288529,-2.447009,0.183966
2013-01-04,-2.231626,0.643234,1.192901,0.753847
2013-01-05,-0.392302,-0.200442,0.476119,-0.943432
2013-01-06,-0.987661,-0.165508,-0.05398,-1.745643


In [None]:
#get first n rows from the data frame

In [None]:
df[:3]

Unnamed: 0,A,B,C,D
2013-01-01,-0.248687,-1.796418,-0.267767,-0.182899
2013-01-02,0.075648,0.937348,-0.432499,0.8423
2013-01-03,0.054584,-0.842191,1.18828,0.057714


In [None]:
#slice based on date range

In [None]:
df['20130101':'20130104']

Unnamed: 0,A,B,C,D
2013-01-01,-0.248687,-1.796418,-0.267767,-0.182899
2013-01-02,0.075648,0.937348,-0.432499,0.8423
2013-01-03,0.054584,-0.842191,1.18828,0.057714
2013-01-04,0.156224,0.110722,0.790329,-0.53326


In [None]:
#slicing based on column names

In [None]:
df.loc[:,['A','B']]

Unnamed: 0,A,B
2013-01-01,-0.248687,-1.796418
2013-01-02,0.075648,0.937348
2013-01-03,0.054584,-0.842191
2013-01-04,0.156224,0.110722
2013-01-05,1.256793,-0.930222
2013-01-06,0.968257,1.689979


In [None]:
#slicing based on both row index labels and column names
df.loc['20130102':'20130103',['A','B']]

Unnamed: 0,A,B
2013-01-02,0.075648,0.937348
2013-01-03,0.054584,-0.842191


In [42]:
#slicing based on index of columns
#returns 4th row (index is 3rd)
df.iloc[3]

A   -2.231626
B    0.643234
C    1.192901
D    0.753847
Name: 2013-01-04 00:00:00, dtype: float64

In [45]:
df

Unnamed: 0,A,B,C,D
2013-01-01,0.264467,2.097939,0.420385,-1.232054
2013-01-02,0.230363,0.377667,0.469629,2.07671
2013-01-03,-0.801208,-0.288529,-2.447009,0.183966
2013-01-04,-2.231626,0.643234,1.192901,0.753847
2013-01-05,-0.392302,-0.200442,0.476119,-0.943432
2013-01-06,-0.987661,-0.165508,-0.05398,-1.745643


In [44]:
#returns specific rows and columns using lists containing columns or row indexes
df.iloc[[1,5],[0,2]]

Unnamed: 0,A,C
2013-01-02,0.230363,0.469629
2013-01-06,-0.987661,-0.05398


Similarly, we can do Boolean indexing based on column values as well. This helps in filtering a data set based on a pre-defined condition.



In [46]:
df

Unnamed: 0,A,B,C,D
2013-01-01,0.264467,2.097939,0.420385,-1.232054
2013-01-02,0.230363,0.377667,0.469629,2.07671
2013-01-03,-0.801208,-0.288529,-2.447009,0.183966
2013-01-04,-2.231626,0.643234,1.192901,0.753847
2013-01-05,-0.392302,-0.200442,0.476119,-0.943432
2013-01-06,-0.987661,-0.165508,-0.05398,-1.745643


In [56]:
df.B > 1

2013-01-01     True
2013-01-02    False
2013-01-03    False
2013-01-04    False
2013-01-05    False
2013-01-06    False
Freq: D, Name: B, dtype: bool

In [48]:
df[df.B > 1]

Unnamed: 0,A,B,C,D
2013-01-01,0.264467,2.097939,0.420385,-1.232054


In [55]:
df[df.B  < 0.377667]

Unnamed: 0,A,B,C,D
2013-01-02,0.230363,0.377667,0.469629,2.07671
2013-01-03,-0.801208,-0.288529,-2.447009,0.183966
2013-01-05,-0.392302,-0.200442,0.476119,-0.943432
2013-01-06,-0.987661,-0.165508,-0.05398,-1.745643


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:left; border-radius:10px 10px;">Thanks!... Keep working to succeed.. </p>