# Python Libraries
#### `1. Numpy` - Numerical Python that support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
#### `2. Pandas` - for data manipulation and analysis
#### `3. Matplotlib` - is a comprehensive library for creating static, animated, and interactive visualizations 
#### `4. Seaborn` - based on matplotlib but provides a high-level interface for drawing attractive and informative statistical graphics
#### `5. Scikit Learn` - free and open-source machine learning library

### `1. Numpy - NumPy (Numerical Python) is a powerful library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.`

#### Key features and concepts of NumPy:

`1. NumPy Arrays:
   The core of NumPy is the ndarray (n-dimensional array) object. Unlike Python lists, NumPy arrays are homogeneous (all elements must be of the same type) and can be multi-dimensional.`

`2. Array Creation:
   You can create NumPy arrays from Python lists, using functions like np.array(), np.arange(), or np.linspace().`

`3. Array Attributes:
   NumPy arrays have attributes like shape (dimensions), ndim (number of dimensions), and dtype (data type of elements).`

`4. Array Operations:
   NumPy allows element-wise operations on arrays without using explicit loops, which is much faster than traditional Python loops.`

`5. Broadcasting:
   This is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations.`

`6. Indexing and Slicing:
   Similar to Python lists, but extended to multiple dimensions.`

`7. Reshaping:
   You can change the shape of an array without changing its data.`

`8. Mathematical and Statistical Functions:
   NumPy provides a wide range of mathematical functions like sqrt(), exp(), as well as statistical functions like mean(), median(), std(), etc.`

`9. Linear Algebra:
   NumPy includes basic linear algebra operations, such as matrix multiplication, computing eigenvalues, solving linear equations, etc.`

`10. Random Number Generation:
    NumPy has functions for generating random numbers, which are often used in scientific computing and machine learning.`

#### Why use NumPy?
`- Performance: NumPy operations are implemented in C, making them much faster than equivalent Python code, especially for large arrays.`

`- Memory efficiency: NumPy arrays use less memory than Python lists.`

`- Convenience: NumPy's broadcasting feature and vectorized operations make code more readable and concise.`

In [None]:
import numpy as np

# 1. Creating NumPy arrays
print("1. Creating NumPy arrays:")
# From a list
arr1 = np.array([1, 2, 3, 4, 5])
print("From list:", arr1)

In [None]:
# Using np.arange
arr2 = np.arange(0, 10, 2)  # Start, stop, step
print("Using arange:", arr2)

In [None]:
# Using np.linspace
arr3 = np.linspace(0, 1, 5)  # Start, stop, num of points
print("Using linspace:", arr3)

In [None]:
# 2. Array attributes
print("\n2. Array attributes:")
print("Shape:", arr1.shape)
print("Dimension:", arr1.ndim)
print("Data type:", arr1.dtype)

In [None]:
# 3. Array operations
print("\n3. Array operations:")
arr4 = np.array([1, 2, 3])
arr5 = np.array([4, 5, 6])
print("Addition:", arr4 + arr5)
print("Multiplication:", arr4 * arr5)
print("Square root:", np.sqrt(arr4))

In [None]:
# 4. Broadcasting
print("\n4. Broadcasting:")
arr6 = np.array([[1, 2, 3], [4, 5, 6]])
print("Original array:")
print(arr6)
print("Add 10 to each element:")
print(arr6 + 10)

In [None]:
# 5. Indexing and slicing
print("\n5. Indexing and slicing:")
arr7 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Original array:")
print(arr7)
print("Element at row 1, column 2:", arr7[1, 2])
print("Second row:", arr7[1, :])
print("Second column:", arr7[:, 1])

In [None]:
# 6. Reshaping arrays
print("\n6. Reshaping arrays:")
arr8 = np.arange(12)
print("Original array:", arr8)
reshaped = arr8.reshape(3, 4)
print("Reshaped to 3x4:")
print(reshaped)

In [None]:
# 7. Statistical operations
print("\n7. Statistical operations:")
arr9 = np.array([1, 2, 3, 4, 5])
print("Mean:", np.mean(arr9))
print("Median:", np.median(arr9))
print("Standard deviation:", np.std(arr9))

In [None]:
# 8. Linear algebra operations
print("\n8. Linear algebra operations:")
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print("Matrix multiplication:")
print(np.dot(A, B))

In [None]:
# 9. Random number generation
print("\n9. Random number generation:")
random_arr = np.random.rand(3, 3)
print("Random 3x3 array:")
print(random_arr)

### 2. Pandas - 

#### Pandas is a powerful library for data manipulation and analysis in Python. It's built on top of NumPy and provides high-performance, easy-to-use data structures and tools for working with structured data. The two primary data structures in Pandas are Series and DataFrame.

#### Key concepts and features of Pandas:

`1. Series and DataFrame:`
   `- Series: A one-dimensional labeled array that can hold data of any type.`
   `- DataFrame: A two-dimensional labeled data structure with columns of potentially different types.`

`2. Data Import/Export:
   Pandas can read and write data in various formats, including CSV, Excel, SQL databases, and JSON.`

`3. Data Inspection:
   Methods like head(), tail(), info(), and describe() help you quickly understand your data.`

`4. Data Selection and Indexing:
   Pandas provides powerful ways to select and index data using labels or integer positions.`

`5. Data Cleaning:
   Functions for handling missing data, removing duplicates, and replacing values.`

`6. Data Transformation:
   Operations like grouping, aggregating, and pivoting data.`

`7. Merging and Joining:
   Combining multiple datasets using various join operations.`

`8. Time Series Functionality:
   Extensive capabilities for working with date, time, and time-indexed data.`

`9. Data Visualization:
   Integration with plotting libraries for quick data visualization.`

In [None]:
import pandas as pd
import numpy as np

# 1. Creating Pandas objects
print("1. Creating Pandas objects:")

# Creating a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print("Series:")
print(s)

In [None]:
# Creating a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': pd.Timestamp('20230101'),
    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
    'D': np.array([3] * 4, dtype='int32'),
    'E': pd.Categorical(["test", "train", "test", "train"]),
    'F': 'foo'
})
print("DataFrame:")
print(df)

In [None]:
# 2. Viewing data
print("2. Viewing data:")
print("First few rows:")
print(df.head())

In [None]:
print("DataFrame info:")
df.info()

In [None]:
print("DataFrame description:")
print(df.describe())

In [None]:
# 3. Selection
print("3. Selection:")
print("Selecting column 'A':")
print(df['A'])

In [None]:
print("Selecting rows 0 through 2:")
print(df[0:3])

In [None]:
print("Selecting by label (loc):")
print(df.loc[1:2, ['A', 'B']])

In [None]:
print("Selecting by position (iloc):")
print(df.iloc[1:3, 0:2])

In [None]:
# 4. Data manipulation
print("\n4. Data manipulation:")

# Adding a new column
df['G'] = df['A'] + df['D']
print("Added new column 'G':")
print(df)

In [None]:
# Filtering data
print("\nFiltering data where A > 2:")
print(df[df['A'] > 2])

In [None]:
# Grouping and aggregating
print("\nGrouping by 'E' and calculating mean of 'A':")
print(df.groupby('E')['A'].mean())

In [None]:
# 5. Handling missing data
print("5. Handling missing data:")
df2 = df.copy()
df2.loc[0, 'A'] = np.nan
print("DataFrame with NaN:")
print(df2)

In [None]:
print("Dropping rows with NaN:")
print(df2.dropna())

In [None]:
#print("Filling NaN with 0:")
#print(df2.fillna(0))

In [None]:
# 6. Merging DataFrames
print("6. Merging DataFrames:")
df3 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                    'B': ['B0', 'B1', 'B2']},
                    index=['K0', 'K1', 'K2'])

df4 = pd.DataFrame({'C': ['C0', 'C1', 'C2'],
                    'D': ['D0', 'D1', 'D2']},
                    index=['K0', 'K2', 'K3'])

print("Merging df3 and df4:")
print(pd.merge(df3, df4, left_index=True, right_index=True, how='outer'))

In [None]:
# 7. Time series
print("7. Time series:")
rng = pd.date_range('1/1/2023', periods=100, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
print("Time series data:")
print(ts.head())

In [None]:
print("Resampling to monthly frequency:")
print(ts.resample('ME').mean())

In [None]:
# 8. Reading and writing data
print("8. Reading and writing data:")
# Writing to CSV
#df.to_csv('example.csv')
print("DataFrame written to 'example.csv'")

In [None]:
# Reading from CSV
#df_read = pd.read_csv('example.csv', index_col=0)
print("DataFrame read from 'example.csv':")
#print(df_read.head())

In [None]:
# 9. Data visualization
print("9. Data visualization:")
print("To visualize data, you can use df.plot() or integrate with matplotlib.")
#print("Example: df['A'].plot(kind='bar')")

#### Why Pandas is crucial for data science:

`- Efficient data handling: Pandas is optimized for performance with large datasets.`

`- Data preprocessing: It provides tools essential for cleaning and preparing data for analysis.`

`- Flexibility: It can handle various data types and formats commonly used in data science.`

`- Integration: Pandas works well with other data science libraries like NumPy, Matplotlib, and scikit-learn.`