#Data Toolkit Assignment

1. What is the difference between multithreading and multiprocessing?
   - **Multithreading**: Runs multiple threads within a single process, sharing the same memory space. Best for I/O-bound tasks.
   - **Multiprocessing**: Runs multiple processes with separate memory space. Best for CPU-bound tasks.

2. What are the challenges associated with memory management in Python?
   - **Garbage collection overhead** can affect performance.
   - **Reference cycles** can keep memory from being freed.
   - **Memory leaks** can occur if objects are unintentionally referenced.
   - **Large objects** may not be released immediately.
   - **GIL (Global Interpreter Lock)** limits true parallel execution of threads.

3. Write a Python program that logs an error message to a log file when a division by zero exception occurs.

In [None]:
import logging

# Configure logging
logging.basicConfig(filename='error.log', level=logging.ERROR)

try:
    a = 10
    b = 0
    result = a / b
except ZeroDivisionError as e:
    logging.error(f"Division by zero occurred: {e}")
    print("Error logged to file.")

4. Write a Python program that reads from one file and writes its content to another file.

In [None]:
with open('source.txt', 'r') as src:
    data = src.read()

with open('destination.txt', 'w') as dest:
    dest.write(data)

print("File copied successfully.")

5. Write a program that handles both IndexError and KeyError using a try-except block.

In [None]:
try:
    lst = [1, 2, 3]
    print(lst[5])  # IndexError

    d = {"a": 1}
    print(d["b"])  # KeyError

except IndexError:
    print("Index out of range error caught.")
except KeyError:
    print("Key not found error caught.")

6. What are the differences between NumPy arrays and Python lists?
   - NumPy arrays store **homogeneous** data; lists can store **heterogeneous** data.
   - NumPy arrays are **faster and more memory-efficient**.
   - NumPy supports **vectorized operations**, lists do not.
   - NumPy is **fixed size**, lists are **dynamic**.

7. Explain the difference between apply() and map() in Pandas.
   - map() works only on **Series**, applying a function element-wise.
   - apply() works on **Series and DataFrames**, applying functions row-wise or column-wise.
   - apply() is more flexible than map().

8. Create a histogram using Seaborn to visualize a distribution.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

data = np.random.randn(1000)
sns.histplot(data, kde=True)
plt.title("Histogram of Distribution")
plt.show()

9. Use Pandas to load a CSV file and display its first 5 rows.

In [None]:
import pandas as pd

df = pd.read_csv('data.csv')
print(df.head())

10. Calculate the correlation matrix using Seaborn and visualize it with a heatmap.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Example DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 4, 5, 6]
})

corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()