In [1]:
import pandas as pd
import numpy as np
# 'as' is used to make abrieviation for the library.

## Pandas

In [3]:
# we use read_csv to load csv file
# syntax:
# df(dataframe) = pd.read_csv("csv file path")
# for excel we use read_excel()

In [5]:
# df.head() is used to print the head values (5 rows) of the dataframe

In [6]:
# how to create dataframe
songs = {'Album': ['Thriller', 'Back in Black', 'The Dark Side of the Moon', 'The Bodyguard', 'Bat out of Hell'],
        'Released': [1982, 1980, 1973, 1992, 1977],
        'Length': ['00:42:19', '00:42:11', '00:42:49', '00:57:44', '00:46:33']}
songs_frame = pd.DataFrame(songs)

In [8]:
songs_frame.head()

Unnamed: 0,Album,Released,Length
0,Thriller,1982,00:42:19
1,Back in Black,1980,00:42:11
2,The Dark Side of the Moon,1973,00:42:49
3,The Bodyguard,1992,00:57:44
4,Bat out of Hell,1977,00:46:33


In [9]:
# to create a new dataframe from the existing one we use the indexing function.
x = songs_frame[['Length']]
# can be done with multiple columns

In [10]:
x

Unnamed: 0,Length
0,00:42:19
1,00:42:11
2,00:42:49
3,00:57:44
4,00:46:33


In [12]:
songs_frame['Length'].unique()

array(['00:42:19', '00:42:11', '00:42:49', '00:57:44', '00:46:33'],
      dtype=object)

In [13]:
# to select specific values we use the boolean indexing function
long_songs = songs_frame[songs_frame['Released'] >= 1980]

In [14]:
long_songs

Unnamed: 0,Album,Released,Length
0,Thriller,1982,00:42:19
1,Back in Black,1980,00:42:11
3,The Bodyguard,1992,00:57:44


In [15]:
# we save the dataframe to a file using commands like
# to_csv
# to_excel
# we should also incude the extension with the file name

### Series Attributes and Methods

Pandas Series come with various attributes and methods to help you manipulate and analyze data effectively. Here are a few essential ones:

values: Returns the Series data as a NumPy array.

index: Returns the index (labels) of the Series.

shape: Returns a tuple representing the dimensions of the Series.

size: Returns the number of elements in the Series.

mean(), sum(), min(), max(): Calculate summary statistics of the data.

unique(), nunique(): Get unique values or the number of unique values.

sort_values(), sort_index(): Sort the Series by values or index labels.

isnull(), notnull(): Check for missing (NaN) or non-missing values.

apply(): Apply a custom function to each element of the Series.

### DataFrame Attributes and Methods

DataFrames provide numerous attributes and methods for data manipulation and analysis, including:

shape: Returns the dimensions (number of rows and columns) of the DataFrame.

info(): Provides a summary of the DataFrame, including data types and non-null counts.

describe(): Generates summary statistics for numerical columns.

head(), tail(): Displays the first or last n rows of the DataFrame.

mean(), sum(), min(), max(): Calculate summary statistics for columns.

sort_values(): Sort the DataFrame by one or more columns.

groupby(): Group data based on specific columns for aggregation.

fillna(), drop(), rename(): Handle missing values, drop columns, or rename columns.

apply(): Apply a function to each element, row, or column of the DataFrame.

## Numpy 1-D

In [16]:
# creating array using numpy that contains 1 type of datatype only
a = np.array([0,1,2,3,4])

In [17]:
a

array([0, 1, 2, 3, 4])

In [23]:
# to see the type of object we use type()
print(type(a))
# and to check the datatype of array object we use .dtype
print(a.dtype)

<class 'numpy.ndarray'>
int32


In [26]:
# we use .size to get the size of the array
print(a.size)
# ndim to get the number of dimensions
print(a.ndim)
# shape to get the shape of array
print(a.shape)

5
1
(5,)


In [27]:
# we can directly change the values of array using indexing
a[0] = 10
print(a)

[10  1  2  3  4]


In [28]:
# we can slice the array like normal python string
a[0:2]

array([10,  1])

## Vector Addition and Subtraction

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [29]:
# This is done via simple a * b

![image.png](attachment:image.png)

In [30]:
# this is done via function np.dot(a,b)

#### This property of adding a scaler unit to all vector unit is called broadcasting

## Universal functions

In [31]:
# we use .mean() to get the mean
a.mean()

4.0

In [33]:
# we use .max() to get the max value
a.max()

10

In [34]:
# same for the min values we use .min()
a.min()

1

In [36]:
# to create an array with range we use np.linespace() function
np.linspace(-2,2,num = 5)
# np.linspace(start, end, steps)

array([-2., -1.,  0.,  1.,  2.])

### 2D Array

In [2]:
a = [[11,12,13],[21,22,23],[31,32,33]]
A = np.array(a)
A

array([[11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

In [3]:
# ndim for the number of dimensions of the array
A.ndim

2

In [4]:
# shape of the array
A.shape

(3, 3)

In [5]:
a = [[11,12,13],[21,22],[31,32,33]]
A = np.array(a)
A

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

In [6]:
# We can use Indexing and Slicing like 1-D Array

# We can add, multiply scaler units in matrix too

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [7]:
# we use dot product for matrix multiplication

In [10]:
# To calculate the transpose of an matrix we can use .T function
A.T


array([[11, 21, 31],
       [12, 22, 32],
       [13, 23, 33]])

In [13]:
X=np.array([[1,0],[0,1]])
Y=np.array([[2,2],[2,2]])
Z=np.dot(X,Y)
Z

array([[2, 2],
       [2, 2]])