# Introducing Pandas

Pandas is a Python library that makes handling tabular data easier. Since we are doing data science - this is something we will use from time to time. 

It is one of three libraries you will encounter repeatedly in the field of data science:

## Pandas 
Pandas is a powerful and flexible open-source data manipulation and analysis library for Python. It provides data structures like DataFrames and Series to handle and analyze data efficiently. It is especially useful for tasks involving data cleaning, transformation, and analysis.

Key Features of Pandas
- Data Structures: Provides two primary data structures - Series (one-dimensional) and DataFrame (two-dimensional).
- Data Handling: Easy handling of missing data.
- Data Alignment: Automatic and explicit data alignment.
- Data Wrangling: Tools for merging, joining, reshaping, and pivoting data.
- Time Series: Robust support for working with time-series data.
- Data Aggregation: Group by functionality for data aggregation and transformations. 


In [3]:
# Creating a DataFrame
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [None]:
# Reading data from CSV file
df = pd.read_csv('data.csv')

# Writing data to a CSV file
df.to_csv('output.csv', index)

In [1]:
# Selecting a column
ages = df['Age']

# Filtering rows based on a condition
filtered_df = df[df['Age'] > 25]

# Filling missing values
df.fillna(0, inplace = True)

# Dropping rows with missing values
df.dropna(inplace = True)

# Group by and aggregation
grouped = df.groupby('City').mean()
print(grouped)

NameError: name 'df' is not defined

# NumPy

NumPy(Numerical Python) is a fundamental package for scientific computing in Python. It provides support for arrays, matrices, and wide range of mathematical functions to operate on these data structures. 

## Key Features of NumPy

- N-dimenstional array: Provides an N-dimensional array object('ndarray').
- Mathematical functions: contains a large collection of mathematical functions to operate on arrays.
- Linear Algebra: Includes linear algebra, Fourier transform and random number capabilities.
- Efficiency: Offers fast and efficient array operations






In [4]:
# Creating arrays
import numpy as np

# Creating 1D array
arr = np.array([1, 2, 3, 4, 5])

# Creating a 2D array
matrix = np.array([[1, 2, 3], [4, 5, 6]])

# Array Operations

# Elements-wise operations
arr = np.array([1, 2, 3])
print(arr + 1)

# Matrix multiplication
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print(np.dot(a, b)) #Output: [[19, 20], [43 50]]

# Mathematical Functions

#Trigonometric Functions
angles = np.array([0, np.pi/2, np.pi])
print(np.sin(angles))

# Statistical functions
data = np.array([1, 2, 3, 4, 5])
print(np.mean(data))
print(np.std(data))

# Array manipulation 
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Reshaping
reshaped = arr.reshape((3, 2))
print(reshaped)

# Flattening
flattened = arr.flatten()
print(flattened)

# Random number generation
random_numbers = np.random.rand(3,3)
print(random_numbers)

[2 3 4]
[[19 22]
 [43 50]]
[0.0000000e+00 1.0000000e+00 1.2246468e-16]
3.0
1.4142135623730951
[[1 2]
 [3 4]
 [5 6]]
[1 2 3 4 5 6]
[[0.64246849 0.48602337 0.66649481]
 [0.24451288 0.17058515 0.22393195]
 [0.02251887 0.27323625 0.11284583]]


# Combining Pandas and NumPy

Pandas and NumPy often work together to handle and analyze data efficiently. For example, you can use NumPy arrays to perform mathematical operations on data within a pandas DataFrame

In [6]:
import pandas as pd
import numpy as np


# Creating a DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}

df = pd.DataFrame(data)


# Applying a NumPy function to a DataFrame Column
df['C'] = np.sqrt(df['A']**2 + df['B']**2)
print(df)

   A  B         C
0  1  4  4.123106
1  2  5  5.385165
2  3  6  6.708204
