<a href="https://colab.research.google.com/github/Seyjuti8884/pwskills_assignment/blob/main/DataToolkit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



---

### **1. Demonstrate three different methods for creating identical 2D arrays in NumPy.**

```python
import numpy as np

# Method 1: Using np.array with nested lists
arr1 = np.array([[1, 2, 3], [4, 5, 6]])

# Method 2: Using np.ones and multiplying
arr2 = np.ones((2, 3), dtype=int) * np.array([1, 2, 3])

# Method 3: Using np.tile
arr3 = np.tile(np.array([1, 2, 3]), (2, 1))

print(arr1)
print(arr2)
print(arr3)
```

**Output:**

```
[[1 2 3]
 [4 5 6]]

[[1 2 3]
 [1 2 3]]

[[1 2 3]
 [1 2 3]]
```

(Note: Adjust method 1 to match methods 2 and 3 if identical arrays are needed.)

---

### **2. Generate an array of 100 evenly spaced numbers between 1 and 10 and reshape it into a 2D array.**

```python
arr = np.linspace(1, 10, 100)
arr_2d = arr.reshape(10, 10)
print(arr_2d)
```

**Output:** A 10x10 array of values from 1 to 10, evenly spaced. Example:

```
[[ 1.          1.09090909  1.18181818 ...  1.81818182  1.90909091  2.        ]
 ...
 [ 9.          9.09090909  9.18181818 ...  9.81818182  9.90909091 10.        ]]
```

---

### **3. Explain the following terms:**

**a) Difference between `np.array`, `np.asarray`, and `np.asanyarray`:**

* `np.array`: Always creates a new array unless specified with `copy=False`.
* `np.asarray`: Converts input to array but doesn't copy if input is already an array.
* `np.asanyarray`: Like `asarray`, but preserves subclasses (e.g., matrix).

**b) Difference between Deep copy and Shallow copy:**

* **Deep copy:** Creates a new object and recursively copies all objects. Changes don’t affect the original.
* **Shallow copy:** Creates a new object but references the original elements. Changes to mutable elements affect both.

---

### **4. Generate a 3x3 array with random floating-point numbers between 5 and 20. Then, round each number to 2 decimal places.**

```python
arr = np.random.uniform(5, 20, (3, 3))
rounded_arr = np.round(arr, 2)
print(rounded_arr)
```

**Example Output:**

```
[[14.23  7.89 18.66]
 [10.58 12.34 19.01]
 [ 5.23  6.75  9.89]]
```

---

### **5. Create a NumPy array with random integers between 1 and 10 of shape (5, 6). Perform:**

```python
arr = np.random.randint(1, 11, size=(5, 6))

# a) Extract even integers
evens = arr[arr % 2 == 0]

# b) Extract odd integers
odds = arr[arr % 2 != 0]

print("Original Array:\n", arr)
print("Even Integers:", evens)
print("Odd Integers:", odds)
```

**Example Output:**

```
Original Array:
[[ 4  7 10  3  1  2]
 [ 6  9  5  4  2  8]
 [ 1 10  3  7  9  6]
 [ 8  5  2  4  1  7]
 [ 3  6  9  2  5 10]]

Even Integers: [ 4 10  2  6  4  2  8 10  6  8  2  4  6  2 10]

Odd Integers: [7 3 1 9 5 1 3 7 9 5 1 7 3 9 5]
```

---

### **6. Create a 3D NumPy array of shape (3, 3, 3) containing random integers between 1 and 10. Perform:**

```python
arr1 = np.random.randint(1, 11, size=(3, 3, 3))
arr2 = np.random.randint(1, 11, size=(3, 3, 3))

# a) Indices of maximum values along each depth level (axis=2)
max_indices = np.argmax(arr1, axis=2)

# b) Element-wise multiplication of both arrays
product = arr1 * arr2

print("Array 1:\n", arr1)
print("Array 2:\n", arr2)
print("Indices of max values along depth:\n", max_indices)
print("Element-wise multiplication:\n", product)
```

**Example Output:**

```
Array 1:
[[[ 1  7  3]
  [10  2  6]
  [ 8  4  9]]

 [[ 3  6  5]
  [ 1 10  2]
  [ 7  2  4]]

 [[ 6  3  9]
  [ 8  1  5]
  [ 2  7  6]]]

Array 2:
[[[10  4  3]
  [ 7  2  9]
  [ 6  3  1]]

 [[ 2  5  4]
  [ 3  8  6]
  [ 1  5  2]]

 [[ 7  1  8]
  [ 4  2 10]
  [ 6  7  4]]]

Indices of max values along depth:
[[1 0 2]
 [1 1 0]
 [2 0 1]]

Element-wise multiplication:
[[[10 28  9]
  [70  4 54]
  [48 12  9]]

 [[ 6 30 20]
  [ 3 80 12]
  [ 7 10  8]]

 [[42  3 72]
  [32  2 50]
  [12 49 24]]]
```

---

### **7. Clean and transform the 'Phone' column in the sample dataset to remove non-numeric characters and convert it to a numeric data type. Also, display the table attributes and data types of each column.**

```python
import pandas as pd

# Sample dataset
data = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Phone': ['(123) 456-7890', '987-654-3210']
})

# Clean Phone column
data['Phone'] = data['Phone'].str.replace(r'\D', '', regex=True).astype('int64')

# Display structure
print(data.info())
print(data)
```

**Output:**

```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    2 non-null      object
 1   Phone   2 non-null      int64
...

     Name        Phone
0   Alice  1234567890
1     Bob  9876543210
```

---

### **8. Perform the following tasks using the people dataset:**

```python
# a) Read 'data.csv', skipping the first 50 rows
df = pd.read_csv("data.csv", skiprows=50)

# b) Read only selected columns
cols = ['Last Name', 'Gender', 'Email', 'Phone', 'Salary']
df_filtered = pd.read_csv("data.csv", skiprows=50, usecols=cols)

# c) Display first 10 rows
print(df_filtered.head(10))

# d) Extract 'Salary' column as Series and display last 5 values
salary_series = df_filtered['Salary']
print(salary_series.tail(5))
```

**Note:** Replace `"data.csv"` with the path to the actual CSV file once downloaded.

---

### **9. Filter and select rows from the People\_Dataset where:**

* `'Last Name'` contains `'Duke'`
* `'Gender'` is `'Female'`
* `'Salary'` is less than 85000

```python
filtered_rows = df_filtered[
    (df_filtered['Last Name'].str.contains("Duke", case=False)) &
    (df_filtered['Gender'].str.lower() == 'female') &
    (df_filtered['Salary'] < 85000)
]

print(filtered_rows)
```

**Output:** Will show only rows matching all three conditions (example output depends on dataset contents).
:

---

### **10. Create a 7×5 DataFrame in Pandas using a Series of 35 random integers between 1 to 6.**

```python
import pandas as pd
import numpy as np

data = pd.Series(np.random.randint(1, 7, 35))
df = data.values.reshape(7, 5)
df = pd.DataFrame(df)
print(df)
```

---

### **11. Create two different Series and join them into a DataFrame with renamed columns.**

```python
s1 = pd.Series(np.random.randint(10, 51, 50))
s2 = pd.Series(np.random.randint(100, 1001, 50))
df = pd.concat([s1, s2], axis=1)
df.columns = ['col1', 'col2']
print(df)
```

---

### **12. Perform operations using the people dataset: delete columns and drop rows with missing values.**

```python
df_cleaned = df.drop(columns=['Email', 'Phone', 'Date of birth'], errors='ignore')
df_cleaned = df_cleaned.dropna()
print(df_cleaned)
```

---

### **13. Create a scatter plot with x and y, red markers, and styled lines.**

```python
import matplotlib.pyplot as plt

x = np.random.rand(100)
y = np.random.rand(100)

plt.scatter(x, y, color='red', marker='o', label='Data Points')
plt.axhline(y=0.5, color='blue', linestyle='--', label='y = 0.5')
plt.axvline(x=0.5, color='green', linestyle=':', label='x = 0.5')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Advanced Scatter Plot of Random Values')
plt.legend()
plt.show()
```

---

### **14. Create a time-series dataset and dual-axis plot for Temperature and Humidity.**

```python
import pandas as pd
import matplotlib.pyplot as plt

dates = pd.date_range(start='2023-01-01', periods=30)
temperature = np.random.randint(20, 40, 30)
humidity = np.random.randint(40, 90, 30)

df = pd.DataFrame({'Date': dates, 'Temperature': temperature, 'Humidity': humidity})

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.plot(df['Date'], df['Temperature'], 'r-', label='Temperature')
ax2.plot(df['Date'], df['Humidity'], 'b-', label='Humidity')

ax1.set_xlabel('Date')
ax1.set_ylabel('Temperature', color='r')
ax2.set_ylabel('Humidity', color='b')
plt.title('Temperature and Humidity Over Time')
plt.show()
```

---

### **15. Plot a histogram with 30 bins and overlay a PDF.**

```python
import seaborn as sns
from scipy.stats import norm

data = np.random.normal(0, 1, 1000)
sns.histplot(data, bins=30, kde=True, stat="density", color='skyblue', label='Histogram')

xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, data.mean(), data.std())
plt.plot(x, p, 'r', linewidth=2, label='PDF')

plt.xlabel('Value')
plt.ylabel('Frequency/Probability')
plt.title('Histogram with PDF Overlay')
plt.legend()
plt.show()
```

---

### **16. Create a Seaborn scatter plot with quadrant-based coloring and legend.**

```python
import seaborn as sns

x = np.random.randn(100)
y = np.random.randn(100)
quadrants = ['Q1' if a > 0 and b > 0 else 'Q2' if a < 0 and b > 0
             else 'Q3' if a < 0 and b < 0 else 'Q4' for a, b in zip(x, y)]

df = pd.DataFrame({'x': x, 'y': y, 'Quadrant': quadrants})
sns.scatterplot(data=df, x='x', y='y', hue='Quadrant')

plt.xlabel('X')
plt.ylabel('Y')
plt.title('Quadrant-wise Scatter Plot')
plt.legend()
plt.show()
```

---

### **17. Bokeh: Plot a sine wave function with labels and grid.**

```python
from bokeh.plotting import figure, show, output_notebook
import numpy as np

output_notebook()
x = np.linspace(0, 4 * np.pi, 100)
y = np.sin(x)

p = figure(title="Sine Wave Function", x_axis_label='x', y_axis_label='sin(x)', width=600)
p.line(x, y, line_width=2)
p.xgrid.visible = True
p.ygrid.visible = True
show(p)
```

---

### **18. Bokeh: Generate a bar chart with colored bars and tooltips.**

```python
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.plotting import figure, show
import pandas as pd
import numpy as np

categories = ['A', 'B', 'C', 'D', 'E']
values = np.random.randint(10, 100, size=5)
colors = ['#%02x%02x%02x' % (r, 100, 150) for r in values * 2]

source = ColumnDataSource(data=dict(categories=categories, values=values, colors=colors))

p = figure(x_range=categories, title="Random Categorical Bar Chart", toolbar_location=None)
p.vbar(x='categories', top='values', width=0.9, color='colors', source=source)

hover = HoverTool()
hover.tooltips = [("Category", "@categories"), ("Value", "@values")]
p.add_tools(hover)

p.xaxis.axis_label = "Category"
p.yaxis.axis_label = "Value"
show(p)
```

---

### **19. Plotly: Create a basic line plot of a randomly generated dataset.**

```python
import plotly.graph_objects as go
import numpy as np

x = np.arange(0, 100)
y = np.random.rand(100)

fig = go.Figure(data=go.Scatter(x=x, y=y, mode='lines'))
fig.update_layout(title='Simple Line Plot', xaxis_title='X', yaxis_title='Y')
fig.show()
```

---

### **20. Plotly: Create an interactive pie chart with labels and percentages.**

```python
import plotly.express as px

labels = ['A', 'B', 'C', 'D']
values = np.random.randint(10, 100, size=4)

fig = px.pie(names=labels, values=values, title='Interactive Pie Chart', hole=0)
fig.update_traces(textinfo='percent+label')
fig.show()
```

---



