## NumPy

**NumPy** is a Python library used for working with arrays. It also has functions for working in the domain of linear algebra, Fourier transform, and matrices.

Let's start by installing NumPy using pip:

In [1]:
!pip install numpy




In [2]:
import numpy as np

In Google Colab, you can install packages using pip directly within a notebook cell. Just prefix the command with an exclamation mark.

### NumPy Arrays

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements, all of the same type, indexed by a tuple of non-negative integers.

In [4]:
# Create a simple numpy array
arr = np.array([[[0], [1], [2]], [[3], [4], [5]]])
print(arr)

[[[0]
  [1]
  [2]]

 [[3]
  [4]
  [5]]]


In [5]:
arr.shape

(2, 3, 1)

### Array Attributes

Each array has attributes `ndim` (the number of dimensions), `shape` (the size of each dimension), and `size` (the total size of the array):

In [6]:
print(arr.ndim)
print(arr.shape)
print(arr.size)

3
(2, 3, 1)
6


### Mathematical Operations

Mathematical operations like addition, subtraction, multiplication, and division can be performed element-wise on NumPy arrays.

In [7]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)
print(a - b)
print(a * b)
print(a / b)


[5 7 9]
[-3 -3 -3]
[ 4 10 18]
[0.25 0.4  0.5 ]


Beyond one-dimensional arrays, NumPy can also create 2D arrays and even higher dimensional arrays.

In [8]:
# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr_2d)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [11]:
print(arr_2d[2, 2])

9


## Array Indexing and Slicing

You can access array elements through indexing and slice arrays to obtain different sub-arrays.

In [19]:
# Slicing
print(arr_2d[-2:, :])  # 2nd row onwards, prints [[4, 5, 6],[7, 8, 9]]

[[4 5 6]
 [7 8 9]]


## Array Operations

NumPy provides many functions for array operations.

In [22]:
# Reshape
arr_reshaped = np.reshape(arr_2d, (1, 9))  # Reshape array to 1x9
print(arr_reshaped)



ValueError: ignored

In [23]:
# Transpose
arr_transposed = np.transpose(arr_2d)
print(arr_transposed)

[[1 4 7]
 [2 5 8]
 [3 6 9]]


## More Mathematical Functions

NumPy provides many mathematical functions that can be performed on arrays.

In [24]:
# Sum
print(np.sum(arr_2d))

# Mean
print(np.mean(arr_2d))

# Max
print(np.max(arr_2d))

# Min
print(np.min(arr_2d))

45
5.0
9
1


## Random Numbers

NumPy provides functions to generate random numbers.

In [None]:
# Generate a random number between 0 and 1
print(np.random.rand())

# Generate an array of random numbers
print(np.random.rand(2, 2))

0.006766850570029459
[[0.36148881 0.54352197]
 [0.69958403 0.19550858]]


## Saving and Loading

You can save NumPy arrays to disk and load them back.

In [25]:
# Saving an array
np.save('my_array', arr_2d)

# Loading an array
arr_loaded = np.load('my_array.npy')
print(arr_loaded)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


## Pandas

**Pandas** is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

Like NumPy, install Pandas using pip:

In [38]:
!pip install pandas




In [39]:
import pandas as pd

### Pandas DataFrame

A DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [40]:
# Create a simple pandas DataFrame
data = {'Name': ['John', 'Mary', 'Amy', 'Harry'],
        'Age': [20, 30, 40, 30],
        'Salary': [20000, 40000, 60000, 60000]}

df = pd.DataFrame(data)

print(df)


    Name  Age  Salary
0   John   20   20000
1   Mary   30   40000
2    Amy   40   60000
3  Harry   30   60000


In [41]:
df.columns

Index(['Name', 'Age', 'Salary'], dtype='object')

In [42]:
df.index

RangeIndex(start=0, stop=4, step=1)

In [43]:
df.sort_values(by=['Name', 'Salary'])

Unnamed: 0,Name,Age,Salary
2,Amy,40,60000
3,Harry,30,60000
0,John,20,20000
1,Mary,30,40000


### Data Selection

You can select data from a DataFrame using column names, or using conditions.

In [44]:
# Selecting a single column
print(df['Name'])

0     John
1     Mary
2      Amy
3    Harry
Name: Name, dtype: object


In [45]:

# Selecting rows where 'A' is 'foo'
print(df[df['Name'] == 'John'])


   Name  Age  Salary
0  John   20   20000


In [46]:
df_subbset= df[['Name', 'Age']]
df_subbset.head()

Unnamed: 0,Name,Age
0,John,20
1,Mary,30
2,Amy,40
3,Harry,30


In [47]:
# Filter rows using .loc:
df.loc[df['Age'] > 30]

Unnamed: 0,Name,Age,Salary
2,Amy,40,60000


Groupby a column and perform aggregations:

In [48]:

df.groupby('Age').agg({'Salary': 'sum', 'Age': 'count'})

Unnamed: 0_level_0,Salary,Age
Age,Unnamed: 1_level_1,Unnamed: 2_level_1
20,20000,1
30,100000,2
40,60000,1


### Data Cleaning

Pandas provides several methods for data cleaning in Python. One of the most commonly used is `dropna()`, which removes missing values.

In [49]:
# Creating a DataFrame with missing values
data = {'name': ['John', 'Anna', 'Peter', 'Linda'], 'age': [23, np.nan, 29, 45]}
df = pd.DataFrame(data)

# Removing rows with missing values
df = df.dropna()
print(df)


    name   age
0   John  23.0
2  Peter  29.0
3  Linda  45.0


### Data Aggregation

Pandas provides data aggregation functions like `count()`, `sum()`, `mean()`, `median()`, `min()`, `max()` etc.

In [50]:
# Data Aggregation
print(df['age'].mean())


32.333333333333336


## Merge two DataFrames:

In [51]:
left = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
                     'A': ['A0', 'A1', 'A2']})

left



Unnamed: 0,key,A
0,K0,A0
1,K1,A1
2,K2,A2


In [53]:
right = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
                     'B': ['B0', 'B1', 'B2']})

right



Unnamed: 0,key,B
0,K0,B0
1,K1,B1
2,K2,B2


In [54]:
pd.merge(left, right, on='key')

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K1,A1,B1
2,K2,A2,B2
