# Data Toolkit

1. What is NumPy, and why is it widely used in Python ?
  - NumPy (Numerical Python) is a powerful open-source Python library used for numerical and scientific computing. It provides support for multidimensional arrays (ndarrays) and a wide range of mathematical functions to operate on these arrays efficiently. Unlike Python’s built-in lists, NumPy arrays are more compact, faster, and optimized for large-scale data processing.

  - Why Widely Used:

     - Efficiency: Handles large datasets faster than Python lists.

     - Convenience: Offers concise syntax for mathematical and matrix operations.

     - Foundation for Data Science & AI: Forms the backbone for advanced Python libraries in machine learning, data analysis, and scientific research.

  - Example

    import numpy as np

     Create arrays

    arr1 = np.array([1, 2, 3, 4])

    arr2 = np.array([5, 6, 7, 8])

    Perform vectorized operations

    print(arr1 + arr2)   # [ 6  8 10 12]

    print(arr1 * 2)      # [2 4 6 8]

    print(np.mean(arr1)) # 2.5

2. How does broadcasting work in NumPy ?
  - Broadcasting is a powerful mechanism in NumPy that allows arithmetic operations between arrays of different shapes without explicitly copying data. Instead of writing loops, NumPy automatically expands the smaller array so it matches the shape of the larger one, making operations faster and memory-efficient.

  - Broadcasting Rules

     When operating on two arrays, NumPy compares their shapes element-wise from right to left:

     1. If the dimensions are equal → they are compatible.

     2. If one dimension is 1 → it can be stretched (broadcast) to match the other.

     3. If the dimensions differ and neither is 1 → broadcasting fails (error).

  - Examples

     Adding a scalar to an array

     import numpy as np

     arr = np.array([1, 2, 3])
     
     print(arr + 5)   # [6 7 8]

  - Here, 5 is broadcast to [5, 5, 5].

3.  What is a Pandas DataFrame ?
  - A Pandas DataFrame is a two-dimensional, tabular data structure in Python, provided by the Pandas library. Think of it like an Excel spreadsheet or a SQL table, where data is organized in rows and columns with labels.

     import pandas as pd

     Creating a DataFrame from a dictionary

     data = {

     "Name": ["Alice", "Bob", "Charlie"],

     "Age": [25, 30, 35],

     "City": ["New York", "London", "Paris"]

      }

      df = pd.DataFrame(data)

      print(df)

      Output:-

           Name  Age      City

      0        Alice   25  New York

      1        Bob   30    London

      2         Charlie   35     Paris

4. Explain the use of the groupby() method in Pandas ?
  - The groupby() method in Pandas is used to split data into groups based on one or more keys (columns), then apply an operation (like sum, mean, count, etc.), and finally combine the results.

     It follows the “Split → Apply → Combine” process:

       - Split – Divide the data into groups based on some criteria.

       - Apply – Perform a function (sum, mean, count, custom function) on each group.

       - Combine – Merge the results into a new DataFrame or Series.

  - Why Use groupby()

      - Summarize data by categories

      - Perform aggregate statistics (e.g., average sales per region)

      - Handle large datasets more efficiently

      - Enable multi-level grouping for complex analysis

    - Example-

      import pandas as pd

      data = {

      "Department": ["HR", "HR", "IT", "IT", "Finance"],

       "Employee": ["Alice", "Bob", "Charlie", "David", "Eva"],

        "Salary": [50000, 55000, 60000, 62000, 58000]
       
       }

         df = pd.DataFrame(data)

        Group by Department and calculate average salary

        result = df.groupby("Department")["Salary"].mean()

        output:-

        Department
        
        Finance    58000.0
        
        HR         52500.0
        
        IT         61000.0
        
        Name: Salary, dtype: float64

5. Why is Seaborn preferred for statistical visualizations ?
  - Seaborn is a Python data visualization library built on top of Matplotlib. It is widely preferred in data science and statistical analysis because it provides high-level, easy-to-use functions that create beautiful and informative plots with less code.

  - Example

    import matplotlib.pyplot as plt

    import pandas as pd

    df = pd.DataFrame({"x": [1,2,3,4,5], "y": [2,4,5,4,5]})
    
    plt.scatter(df["x"], df["y"])
    
    plt.xlabel("x")
    
    plt.ylabel("y")
    
    plt.show()

    With Seaborn:

    import seaborn as sns

    import pandas as pd

    df = pd.DataFrame({"x": [1,2,3,4,5], "y": [2,4,5,4,5]})
    
    sns.scatterplot(x="x", y="y", data=df)

6. What are the differences between NumPy arrays and Python lists ?
  - Although both NumPy arrays and Python lists are used to store collections of elements, they are fundamentally different in terms of performance, functionality, and memory usage.

  - Example

      -  Python List

     lst = [1, 2, 3, 4]

     result = [x * 2 for x in lst]  # Need a loop

     print(result)  # [2, 4, 6, 8]

       - NumPy Array

      import numpy as np

      arr = np.array([1, 2, 3, 4])
      
      result = arr * 2  # Vectorized operation (no loop needed)
      
      print(result)  # [2 4 6 8]

7. What is a heatmap, and when should it be used ?
  - A heatmap is a data visualization technique that represents values in a matrix (2D table) format using colors instead of numbers.

      - Higher values are usually shown with darker or warmer colors (red, orange, etc.).

      - Lower values are shown with lighter or cooler colors (blue, green, etc.).

     It helps reveal patterns, correlations, and density at a glance.

  - When Should a Heatmap Be Used?

     A heatmap is most useful when you want to:

       - Visualize Correlations

          Example: Show correlation between variables in a dataset (e.g., how "Age" relates to "Income").

       - Analyze Large Datasets Quickly

          Instead of reading big tables of numbers, heatmaps show patterns visually.

       - Show Intensity or Frequency

           Example: Website heatmaps (where users click the most).

           Example: Population density across regions.

       - Compare Across Two Dimensions

           Example: Sales by Month vs Product Category.

8. What does the term “vectorized operation” mean in NumPy ?
  - A vectorized operation in NumPy means applying an operation directly on entire arrays (vectors, matrices, ndarrays) without writing explicit Python loops.

     - Instead of iterating element by element, NumPy uses optimized C and Fortran code under the hood.

     - This makes operations much faster and more efficient compared to normal Python lists.

  - Example:

    import numpy as np

    arr = np.array([1, 2, 3, 4])
    
    result = arr * 2   # Vectorized operation
    
    print(result)  # [2 4 6 8]

9.  How does Matplotlib differ from Plotly ?
  - Both Matplotlib and Plotly are popular Python libraries for data visualization, but they serve different purposes and strengths.

  - Example:
     - Matplotlib (static)

       import matplotlib.pyplot as plt

       x = [1, 2, 3, 4, 5]
     
       y = [10, 20, 25, 30, 40]

       plt.scatter(x, y)
     
       plt.title("Matplotlib Scatter Plot")
     
       plt.show()

       Output: Static image (good for research papers, reports).

      - Plotly (interactive)

        import plotly.express as px

        x = [1, 2, 3, 4, 5]
        
        y = [10, 20, 25, 30, 40]

        fig = px.scatter(x=x, y=y, title="Plotly Scatter Plot")
        
        fig.show()

        Output: Interactive plot (hover to see values, zoom, pan).

10. What is the significance of hierarchical indexing in Pandas ?
   - Organizes Complex Data

        Lets you store and access multi-dimensional data (like time-series by year & month or sales by region & product) in a flat DataFrame.

   - Improves Data Analysis

        Enables powerful operations such as grouping, slicing, and aggregation across multiple levels.

   - Easier Data Selection

        You can select subsets of data using combinations of multiple index levels.

   - Supports Pivoted Data

        Hierarchical indexing is heavily used in pivot tables and groupby() results.

   - Bridges Multi-Dimensional Data → 2D Table

        Useful when working with multi-level categorical variables.


      #Select sales for Region1
     
      print(df.loc["Region1"])

      #Select sales for ProductA in Region2
      
      print(df.loc[("Region2", "ProductA")])

    

11. What is the role of Seaborn’s pairplot() function ?
   - Explore Relationships → Helps you quickly see how variables are related to each other.

   - Visualize Distributions → Displays each feature’s distribution along the diagonal.

   - Detect Patterns → Identifies clusters, trends, or correlations in data.

   - Detect Outliers → Outliers can be spotted in scatter plots.

   - Data Preprocessing → Useful in the exploratory data analysis (EDA) phase before applying machine learning models.

    - Example:

       import seaborn as sns
    
      import matplotlib.pyplot as plt

       Load example dataset
    
        iris = sns.load_dataset("iris")

       Create pairplot
    
       sns.pairplot(iris, hue="species", diag_kind="kde")
    
       plt.show()

12. What is the purpose of the describe() function in Pandas ?
  - Quick Data Summary → Gives you a snapshot of dataset statistics.

  - Detects Data Quality Issues → Helps spot missing values, outliers, or skewed data.

  - Exploratory Data Analysis (EDA) → Used in the early stages of analysis to understand the dataset.

  - Saves Time → Instead of calculating stats manually, describe() does it in one step.

     - Example:

       import pandas as pd

        Sample DataFrame
        
        data = {
           
           "Age": [25, 30, 35, 40, 45],
          
           "Salary": [50000, 55000, 60000, 65000, 70000]
          
          }

          df = pd.DataFrame(data)

          Use describe()
          
          print(df.describe()

       - Output:

            Age        Salary
       
       count    5.000000      5.000000
       
       mean    35.000000  60000.000000
       
       std      7.905694   7905.694150
       
       min     25.000000  50000.000000
       
       25%     30.000000  55000.000000
       
       50%     35.000000  60000.000000
       
       75%     40.000000  65000.000000
       
       max     45.000000  70000.00000

13. Why is handling missing data important in Pandas ?
   - In real-world datasets, missing values (NaN, None) are very common — they might occur due to human error, sensor failure, incomplete surveys, or merging datasets. If not handled properly, they can cause incorrect results, errors, or bias in analysis and machine learning models.

   - Example
    
     import pandas as pd

     data = {
       
       "Name": ["Alice", "Bob", "Charlie", "David"],
        
        "Age": [25, None, 30, None],
       
        "Salary": [50000, 60000, None, 55000]
     
     }

      df = pd.DataFrame(data)
      
      print("Original DataFrame:\n", df)

      Handle missing data
      
      print("\nDrop rows with NaN:\n", df.dropna())
      
      print("\nFill missing values:\n", df.fillna({"Age": df["Age"].mean(), "Salary": 0}))

14. What are the benefits of using Plotly for data visualization ?
   - Plotly is a powerful, interactive, and web-friendly Python library for data visualization. It is widely used in data science, business intelligence, and dashboards because of its rich features.

   - Example
     
     import plotly.express as px

     Sample data
     
     df = px.data.iris()

     Interactive scatter plot
     
     fig = px.scatter(df, x="sepal_width", y="sepal_length",
                
                 color="species", size="petal_length",
                
                 title="Iris Dataset - Interactive Plot")

     fig.show()

15. How does NumPy handle multidimensional arrays ?
   - At the heart of NumPy is the ndarray (N-dimensional array) object.
Unlike Python lists (which are essentially 1D containers that can hold other lists for nesting), NumPy provides true multidimensional arrays stored in contiguous memory blocks, making them efficient and fast.
   
  - Example-

     import numpy as np

     Create a 2D array (matrix)
    
     arr = np.array([[1, 2, 3],
               
                [4, 5, 6],
                [7, 8, 9]])

     print("Array:\n", arr)
     
     print("Dimensions:", arr.ndim)   # 2
     
     print("Shape:", arr.shape)       # (3, 3)
     
     print("Element [1,2]:", arr[1, 2])  # 6

     Reshape into 3D array
     
     arr3d = arr.reshape(3, 3, 1)
     
     print("3D shape:", arr3d.shape)  # (3, 3, 1)

16. What is the role of Bokeh in data visualization ?
   - Interactive Plots

     Unlike Matplotlib (mostly static), Bokeh creates fully interactive visualizations with zooming, panning, hovering, and tooltips.

   - Web-Ready Visualizations

     Outputs visualizations as HTML, JavaScript, or JSON, making them easy to embed in web apps.

   - Scalable Dashboards

     Can create complex dashboards with multiple linked plots, widgets, and real-time streaming data.

   - Handles Large Datasets

     Optimized for rendering large or streaming datasets in the browser.

   - Integration with Other Tools

     Works with Flask, Django, and Jupyter notebooks.

     Can also connect with Pandas, NumPy, and streaming sources.

   - Server Applications

     The Bokeh Server allows building dynamic, interactive web apps in Python without writing JavaScript.

      - Example:

        from bokeh.plotting import figure, show
        
        from bokeh.io import output_notebook

        Enable inline display in Jupyter
        
        output_notebook()

        Create simple interactive plot
        
        p = figure(title="Simple Bokeh Line Chart", x_axis_label="X", y_axis_label="Y")
        
        p.line([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], line_width=2)

        show(p)

17.  Explain the difference between apply() and map() in Pandas
   - map() in Pandas

       The map() function in Pandas is used only with Series objects (1D data). It applies a given function, dictionary, or mapping element-wise to each value in the Series. This makes it ideal for simple transformations, such as mathematical operations, string manipulations, or replacing values using a mapping. Since it works strictly element by element, it is not designed for row-wise or column-wise operations on a DataFrame.

       -  Example:

      import pandas as pd
      
      s = pd.Series([1, 2, 3, 4])

      Double each value
     
      print(s.map(lambda x: x * 2))

      Replace values using dictionary
     
      print(s.map({1: "One", 2: "Two"}))

   - apply() in Pandas

        The apply() function is more flexible because it works with both Series and DataFrames. When used with a Series, it behaves similarly to map() by applying a function element-wise. However, its real power shows with DataFrames, where it can apply functions along an axis (rows or columns). This makes apply() better suited for complex transformations, aggregations, or operations involving multiple columns at once.

          - Example:

         df = pd.DataFrame({
          
          "A": [1, 2, 3],
          
          "B": [4, 5, 6]
        
        })

          Apply to Series
          
          print(df["A"].apply(lambda x: x**2))

          Apply to DataFrame (sum across columns)
          
          print(df.apply(sum, axis=0))

           Apply to DataFrame (row-wise operation)
           
           print(df.apply(lambda row: row["A"] + row["B"], axis=1))

18. What are some advanced features of NumPy ?
  - NumPy’s advanced features—like broadcasting, vectorization, fancy indexing, linear algebra tools, random sampling, and FFT—make it a high-performance foundation for numerical computing and machine learning.
    
  - import numpy as np

     Broadcasting: add a 1D array to a 2D array
    
     A = np.array([[1, 2, 3],
              [4, 5, 6]])
     
     B = np.array([10, 20, 30])
     
     print("Broadcasting:\n", A + B)

     Fancy indexing: select specific rows/columns
     
     C = np.array([10, 20, 30, 40, 50])
     
     print("Fancy Indexing:", C[[0, 2, 4]])  # Picks 1st, 3rd, 5th elements

  - Output:
     
     Broadcasting:
      
      [[11 22 33]
      
      [14 25 36]]

      Fancy Indexing: [10 30 50]

19. How does Pandas simplify time series analysis ?
  - Pandas simplifies time series analysis by offering automatic date parsing, easy indexing, resampling, window operations, and shifting — all built on efficient NumPy arrays, making time-based data handling concise and powerful.
  
  - Example:

    import pandas as pd
    
    import numpy as np

        # Create a date range
        dates = pd.date_range(start="2023-01-01", periods=10, freq="D")

        # Create a simple time series (random sales data)
        data = np.random.randint(50, 200, size=10)
        ts = pd.DataFrame({"Date": dates, "Sales": data})
        ts.set_index("Date", inplace=True)

    print("Original Time Series:")
    
    print(ts)

        # 1. Slice data by date range
        print("\nSales in first 5 days of Jan:")
        print(ts["2023-01-01":"2023-01-05"])

        # 2. Resample to weekly sales sum
        print("\nWeekly Sales Sum:")
        print(ts.resample("W").sum())

        # 3. Rolling average (3-day moving average)
        print("\n3-Day Moving Average:")
        print(ts.rolling(window=3).mean())

        # 4. Shift sales by 1 day (lagging)
        print("\nSales shifted by 1 day:")
        print(ts.shift(1))

20. What is the role of a pivot table in Pandas ?
  - In Pandas, a pivot table plays the role of summarizing, aggregating, and reshaping data to make it easier to analyze patterns and relationships in large datasets. It works very similarly to Excel Pivot Tables but with much more flexibility.
  
  - import pandas as pd

          # Sample dataset
            data = {
             "Region": ["East", "West", "East", "West", "East", "West"],
             "Product": ["A", "A", "B", "B", "C", "C"],
             "Sales": [200, 120, 340, 150, 300, 220]
            }

      df = pd.DataFrame(data)
    
      print("Original Data:")
    
      print(df)

          # Create a pivot table
            pivot = pd.pivot_table(df,
                       values="Sales",
                       index="Region",
                       columns="Product",
                       aggfunc="sum",
                       fill_value=0)

      print("\nPivot Table:")
     
      print(pivot)

     - Output

         Pivot Table:
         
         Product    A    B    C
         
         Region                
        
         East     200  340  300
         
         West     120  150  220

21. Why is NumPy’s array slicing faster than Python’s list slicing ?
   - NumPy slicing is faster because it uses views on contiguous memory with low-level optimizations, while Python lists create new objects with element-wise copying.

   - Example:

     import numpy as np
     
     import time

     Python list
     
     py_list = list(range(10**6))

     NumPy array
     
     np_array = np.arange(10**6)

     List slicing
     
     start = time.time()
     
     list_slice = py_list[100000:200000]
     
     end = time.time()
     
     print("List slicing time:", end - start)

     NumPy slicing
     
     start = time.time()
     
     array_slice = np_array[100000:200000]
     
     end = time.time()
     
     print("NumPy slicing time:", end - start)

     Output:

     List slicing time: 0.007s
     
     NumPy slicing time: 0.000001s

22. What are some common use cases for Seaborn ?
   - Seaborn is a statistical data visualization library built on top of Matplotlib. Its main strength is making beautiful, informative, and complex plots with just a few lines of code. It’s widely used in data analysis and machine learning workflows because it integrates well with Pandas DataFrames.

   - Example:
   
     import seaborn as sns
     
     import pandas as pd
     
     import matplotlib.pyplot as plt

     Sample dataset
     
     data = {
      
      "Math": [88, 92, 80, 89, 100, 86, 94],
      
      "Science": [84, 94, 82, 91, 96, 85, 90],
      
      "English": [78, 85, 80, 86, 89, 83, 88]
       
       }
      
      df = pd.DataFrame(data)

      Create correlation heatmap
      
      sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
      
      plt.title("Correlation Heatmap of Subjects")
      
      plt.show()

# Prcatical Question



1.  How do you create a 2D NumPy array and calculate the sum of each row ?

In [None]:
1. Creating a 2D NumPy Array
  import numpy as np

# Create a 2D NumPy array (3 rows, 4 columns)
arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

print("2D Array:")
print(arr)

2. Calculate the Sum of Each Row
    # Sum of each row
row_sum = np.sum(arr, axis=1)

print("\nSum of Each Row:")
print(row_sum)

OUTPUT:

2D Array:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Sum of Each Row:
[10 26 42]

2. Write a Pandas script to find the mean of a specific column in a DataFrame ?

In [None]:
import pandas as pd

# Create a sample DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 40],
    "Salary": [50000, 60000, 75000, 80000]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Find the mean of a specific column (e.g., Salary)
mean_salary = df["Salary"].mean()

print("\nMean of Salary column:", mean_salary)

OUTPUT:

Original DataFrame:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   75000
3    David   40   80000

Mean of Salary column: 66250.0

3.  Create a scatter plot using Matplotlib ?
  


In [None]:
import matplotlib.pyplot as plt

# Sample data
x = [5, 7, 8, 7, 6, 9, 5, 6, 7, 8]
y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78]

# Create scatter plot
plt.scatter(x, y, color="blue", marker="o", s=100, alpha=0.7)

# Add labels and title
plt.xlabel("X-axis (Example Values)")
plt.ylabel("Y-axis (Example Values)")
plt.title("Simple Scatter Plot")

# Show plot
plt.show()

4. How do you calculate the correlation matrix using Seaborn and visualize it with a heatmap ?

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample dataset
data = {
    "Math": [88, 92, 80, 89, 100, 86, 94],
    "Science": [84, 94, 82, 91, 96, 85, 90],
    "English": [78, 85, 80, 86, 89, 83, 88],
    "History": [82, 90, 78, 85, 92, 84, 86]
}

df = pd.DataFrame(data)

# 1. Calculate correlation matrix
corr_matrix = df.corr()

print("Correlation Matrix:")
print(corr_matrix)

# 2. Visualize with heatmap
plt.figure(figsize=(6,4))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f")

plt.title("Correlation Heatmap")
plt.show()

5. Generate a bar plot using Plotly ?

In [None]:
import plotly.express as px
import pandas as pd

# Sample dataset
data = {
    "Fruits": ["Apples", "Bananas", "Cherries", "Dates", "Elderberries"],
    "Sales": [150, 200, 120, 90, 180]
}

df = pd.DataFrame(data)

# Create bar plot
fig = px.bar(df, x="Fruits", y="Sales", color="Fruits",
             title="Fruit Sales Bar Plot")

# Show interactive plot
fig.show()

6. Create a DataFrame and add a new column based on an existing column ?

In [None]:
import pandas as pd

# Create a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Salary": [50000, 60000, 75000, 80000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Add a new column 'Bonus' (10% of Salary)
df["Bonus"] = df["Salary"] * 0.10

print("\nDataFrame with New Column:")
print(df)

OUTPUT:

Original DataFrame:
      Name  Salary
0    Alice   50000
1      Bob   60000
2  Charlie   75000
3    David   80000

DataFrame with New Column:
      Name  Salary   Bonus
0    Alice   50000   5000.0
1      Bob   60000   6000.0
2  Charlie   75000   7500.0
3    David   80000   8000.0

7. Write a program to perform element-wise multiplication of two NumPy arrays ?

In [None]:
import numpy as np

# Create two NumPy arrays
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([10, 20, 30, 40])

print("Array 1:", arr1)
print("Array 2:", arr2)

# Element-wise multiplication
result = arr1 * arr2   # or np.multiply(arr1, arr2)

print("\nElement-wise Multiplication Result:")
print(result)

OUTPUT:

Array 1: [1 2 3 4]
Array 2: [10 20 30 40]

Element-wise Multiplication Result:
[ 10  40  90 160]

8. Create a line plot with multiple lines using Matplotlib ?

In [None]:
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]      # Line 1
y2 = [1, 3, 5, 7, 9]       # Line 2
y3 = [2, 3, 4, 5, 6]       # Line 3

# Plot multiple lines
plt.plot(x, y1, label="Line 1", marker="o")
plt.plot(x, y2, label="Line 2", marker="s")
plt.plot(x, y3, label="Line 3", marker="^")

# Add labels, title, and legend
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Line Plot with Multiple Lines")
plt.legend()

# Show plot
plt.show()

9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold ?

In [None]:
import pandas as pd

# Create a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, 30, 35, 40, 28],
    "Salary": [50000, 60000, 75000, 80000, 55000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Filter rows where Salary > 60000
filtered_df = df[df["Salary"] > 60000]

print("\nFiltered DataFrame (Salary > 60000):")
print(filtered_df)

OUTPUT :

Original DataFrame:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   75000
3    David   40   80000
4      Eva   28   55000

Filtered DataFrame (Salary > 60000):
      Name  Age  Salary
2  Charlie   35   75000
3    David   40   80000

10. Create a histogram using Seaborn to visualize a distribution ?

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Generate random data (normal distribution)
data = np.random.randn(1000)

# Create histogram
sns.histplot(data, bins=30, kde=True, color="skyblue")

# Add title and labels
plt.title("Distribution of Random Data")
plt.xlabel("Value")
plt.ylabel("Frequency")

# Show plot
plt.show()

11. Perform matrix multiplication using NumPy ?

In [None]:
import numpy as np

# Define two matrices (2x3 and 3x2)
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[7, 8],
              [9, 10],
              [11, 12]])

print("Matrix A:\n", A)
print("Matrix B:\n", B)

# Perform matrix multiplication
C = np.dot(A, B)   # or A @ B

print("\nResult of Matrix Multiplication (A x B):\n", C)

OUTPUT:

Matrix A:
 [[1 2 3]
  [4 5 6]]

Matrix B:
 [[ 7  8]
  [ 9 10]
  [11 12]]

Result of Matrix Multiplication (A x B):
 [[ 58  64]
  [139 154]]


12. Use Pandas to load a CSV file and display its first 5 rows ?

In [None]:
import pandas as pd

# Load CSV file (replace 'data.csv' with your file path)
df = pd.read_csv("data.csv")

# Display the first 5 rows
print("First 5 rows of the CSV file:")
print(df.head())

13. Create a 3D scatter plot using Plotly ?

In [None]:
import plotly.express as px
import pandas as pd

# Sample dataset
data = {
    "X": [5, 7, 8, 7, 6, 9, 5, 6, 7, 8],
    "Y": [99, 86, 87, 88, 100, 86, 103, 87, 94, 78],
    "Z": [50, 60, 65, 70, 80, 75, 85, 90, 95, 100],
    "Category": ["A", "B", "A", "B", "A", "B", "A", "B", "A", "B"]
}

df = pd.DataFrame(data)

# Create 3D scatter plot
fig = px.scatter_3d(df, x="X", y="Y", z="Z",
                    color="Category",  # color points by category
                    size="Z",          # point size based on Z
                    symbol="Category", # different symbols for categories
                    title="3D Scatter Plot Example")

# Show interactive plot
fig.show()