<a href="https://colab.research.google.com/github/Requenamar3/Machine-Learning/blob/main/Machine_Learning_Assigment1_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Understanding NumPy and Pandas: A Practical Guide with Environmental Data
Welcome to this interactive tutorial! Today, we'll work with real environmental radiation data from the EPA's RadNet Database to learn fundamental concepts in NumPy and Pandas. This tutorial will help you understand how to manipulate and analyze scientific data effectively.

### Setting Up Our Environment
First, let's import our essential libraries:

In [62]:
import numpy as np
import pandas as pd

### Reading and Understanding Our Data
The dataset we're working with contains radiation measurements from various locations across the United States. This is real-world data that scientists use to monitor environmental radiation levels.

When working in Google Colab, we first need to connect to our Google Drive to access our data files:

In [63]:
# @title
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Load the CSV file
file_path = '/content/drive/MyDrive/Colab Notebooks/Machine learning/Data set/radiation_clean.csv'
df_assig1 = pd.read_csv(file_path)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


This setup is crucial because:

 1.Google Colab runs in a temporary environment

 2.We need to establish a connection to access files stored in Google Drive

 3.The mount point /content/drive becomes our gateway to access Drive files

__AFTER__ you read the csv file as a data frame, take a look at __df_assig1__ by running the following code cell:

In [64]:
# @title
# View dataset information
df_assig1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 606 entries, 0 to 605
Data columns (total 18 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   State           606 non-null    object 
 1   Location        606 non-null    object 
 2   Date Posted     606 non-null    object 
 3   Date Collected  606 non-null    object 
 4   Sample Type     606 non-null    object 
 5   Unit            606 non-null    object 
 6   Ba-140          605 non-null    float64
 7   Co-60           605 non-null    float64
 8   Cs-134          606 non-null    int64  
 9   Cs-136          506 non-null    float64
 10  Cs-137          606 non-null    int64  
 11  I-131           606 non-null    int64  
 12  I-132           606 non-null    int64  
 13  I-133           605 non-null    float64
 14  Te-129          506 non-null    float64
 15  Te-129m         506 non-null    float64
 16  Te-132          606 non-null    int64  
 17  Ba-140.1        0 non-null      flo

💡 Key Insight: Notice how our dataset has 18 columns, including location information and measurements for different radioactive isotopes. Some columns have missing values (when Non-Null Count is less than 606), which is common in real-world data.

In [65]:
# @title
df_assig1.head()

Unnamed: 0,State,Location,Date Posted,Date Collected,Sample Type,Unit,Ba-140,Co-60,Cs-134,Cs-136,Cs-137,I-131,I-132,I-133,Te-129,Te-129m,Te-132,Ba-140.1
0,ID,Boise,2011-03-30,2011-03-23,Air Filter,pCi/m3,0.0,0.0,0,,0,0,0,0.0,,,0,
1,ID,Boise,2011-03-30,2011-03-23,Air Filter,pCi/m3,0.0,0.0,0,,0,0,0,0.0,,,0,
2,AK,Juneau,2011-03-30,2011-03-23,Air Filter,pCi/m3,0.0,0.0,0,,0,0,0,0.0,,,0,
3,AK,Nome,2011-03-30,2011-03-22,Air Filter,pCi/m3,0.0,0.0,0,,0,0,0,0.0,,,0,
4,AK,Nome,2011-03-30,2011-03-23,Air Filter,pCi/m3,0.0,0.0,0,,0,0,0,0.0,,,0,


In [66]:
# @title
# Display all records for the 'I-131' column
df_assig1['I-131']

Unnamed: 0,I-131
0,0
1,0
2,0
3,0
4,0
...,...
601,6
602,0
603,0
604,0


#Question 1
Create a Numpy array called _i131_data_ with the data from the column _I-131_ in __df_assig1__.

Note: I-131 (i.e., Iodine-131) is one of many radiactive substances.

Hint: Use the same method we used in class to read the PRCP column from the rain dataset into a NumPy array.


##Use Case Example:
###Scenario:
The researcher needs to perform numerical operations (e.g., statistical analysis or simulations) on radiation measurements (I-131) and requires the data in an efficient format for computation.

###Why Use NumPy?
Performance: NumPy arrays are faster for large datasets.
Compatibility: Works seamlessly with numerical libraries like NumPy, SciPy, and TensorFlow.
Efficiency: Enables element-wise operations and advanced slicing.

**This step is commonly used when preparing data for numerical analysis or modeling.**

In [67]:
# @title
# Convert the 'I-131' column from the df into a NumPy array
i131_data = df_assig1['I-131'].to_numpy()
i131_data

array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   1,   1,   0,
         1,   2,   0,   0,   1,   0,   1,   1,   1,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   1,
         1,   0,   0,   1,   1,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   2,   3,   1,   1,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   1,   0,   1,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   1,
         0,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   1,   1,   0,   1,   1,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   1,   0,   1,   0,   0,   1,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   1,   1,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   1,   1,   0,   1,   

#Question 2
(8 points): Return the first, sixth, and twelfth elements of i131_data. __Write only one line of code__ to answer this question.

##Sample Use Case: Retrieving Specific Measurements
###Scenario:
A researcher needs to analyze radiation levels at specific points in the dataset for targeted inspections or validations. Using the indices [0, 5, 11], they can retrieve the corresponding measurements from the i131_data array.
###Use Case Example:
The researcher may use this to:

1. Inspect the first recorded measurement.
2. Validate a mid-point entry for equipment calibration.
3. Analyze a specific value flagged for potential issues.
This approach ensures efficient access to key data points without iterating through the entire dataset.


In [68]:
# @title
# Getting specific elements
first_sixth_twelfth = i131_data[[0, 5, 11]]  # Returns elements at indices 0, 5, and 11
first_sixth_twelfth

array([0, 0, 1])

#Question 3
(8 points): Return the last five elements of i131_data.
__Write only one line of code__ to answer this question.


##Use Case Example: Inspecting Recent Measurements
###Scenario:
A researcher wants to examine the most recent 5 radiation measurements recorded in the i131_data array to understand the latest trends or validate the data for anomalies.

###Use Case Benefits:
1. Targeted Focus: Quickly access the tail end of the dataset without processing the entire array.
2. Operational Efficiency: Use the data for monitoring, debugging, or immediate reporting.
3. Practicality: Ideal for datasets that grow over time, where only recent entries are critical for decision-making.

In [69]:
# @title
# Getting last five elements
last_five = i131_data[-5:]
last_five

array([6, 0, 0, 0, 0])

#Question 4
(8 points): Return how many times the value of i131_data is between 2 and 4 (both numbers included).

__Write only one line of code__ to answer this question.

##Use Case Example: Monitoring Moderate Radiation Levels
###Scenario:
A researcher wants to analyze how many radiation measurements fall within the moderate range (between 2 and 4 inclusive) to identify potential patterns or areas requiring closer inspection.

###Why This is Useful:
1. **Health and Safety Thresholds:**
  * Moderate radiation levels might represent an area of interest for monitoring but are not immediately hazardous.
2. **Trend Analysis:**
  * Identifying how often values fall in a specific range can help detect changes in radiation levels over time or across locations.
3. **Anomaly Detection:**
  * If moderate values appear unexpectedly in certain areas or timeframes, it might signal an anomaly.

###Use Case Benefits:
1. **Quick Insights:** Determines the count of values in a specific range efficiently.
2. **Targeted Actions:** Focus resources on areas with moderate activity for further testing or monitoring.
3. **Scalability:** Works seamlessly for large datasets, enabling real-time analysis.

In [70]:
# @title
# Count values between 2 and 4
between_2_and_4 = np.sum((i131_data >= 2) & (i131_data <= 4))
between_2_and_4

25

**Explanation:**
1. Condition (i131_data >= 2) & (i131_data <= 4):

 * This creates a Boolean array where each element is True if the corresponding value in i131_data is greater than or equal to 2 and less than or equal to 4, and False otherwise.
2. np.sum():

 * Sums the True values in the Boolean array (since True is treated as 1 and False as 0), giving the count of values that satisfy the condition.
3. Result:

 * The variable between_2_and_4 contains the total number of values in i131_data between 2 and 4, inclusive.

***Example:***
If i131_data = [1, 2, 3, 4, 5], then:

  * (i131_data >= 2) & (i131_data <= 4) → [False, True, True, True, False]
  * np.sum([False, True, True, True, False]) → 3
The output (between_2_and_4) would be 3.

#Question 5
 (8 points): Return how many times the value of i131_data is either zero or 1. To answer this question, you CANNOT use the symbol < in your answer (the less than symbol). You must answer it without using this symbol.

##Use Case Example: Identifying Negligible Radiation Levels
###Scenario:
A researcher is analyzing radiation data (i131_data) to determine how often the radiation level is either 0 (no activity) or 1 (minimal activity). This helps in identifying areas or times where radiation is negligible, allowing the researcher to exclude these cases from further analysis or focus only on higher levels.

###Why This is Useful:
1. Data Filtering:
  * Helps filter out non-significant measurements for more focused analysis.

2. Operational Validation:
  * Confirms that equipment is properly recording negligible values when no radiation is present.

3. Resource Allocation:
  * Reduces unnecessary monitoring or investigation in areas with consistently low radiation levels.

###Use Case Benefits:
1. Quick Count: Accurately identifies the frequency of 0 and 1 values in a large dataset.
2. Focus on Significant Data: Excludes negligible measurements from further analysis.
3. Ensures Data Quality: Validates that the dataset includes expected low or zero values in areas with no radiation activity.

In [71]:
# @title
# Get the total size (number of elements) in i131_data
total_size = i131_data.size
print(f"Total number of measurements: {total_size}")

# Count values that are 0 or 1
zeros_and_ones = np.sum((i131_data == 0) | (i131_data == 1))
print(f"Number of measurements with zero or minimal radiation: {zeros_and_ones}")

#calculate the proportion of 0 and 1 values in relation to the total dataset
proportion = (zeros_and_ones / total_size) * 100
print(f"Proportion of measurements with zero or minimal radiation: {proportion:.2f}%")


# Count measurements with radiation levels greater than 1
higher_levels = np.sum(i131_data > 1)

# Calculate the proportion of higher radiation levels
proportion_higher = (higher_levels / total_size) * 100

# Print the results
print(f"Number of measurements with higher radiation levels: {higher_levels}")
print(f"Proportion of measurements with higher radiation levels: {proportion_higher:.2f}%")

Total number of measurements: 606
Number of measurements with zero or minimal radiation: 515
Proportion of measurements with zero or minimal radiation: 84.98%
Number of measurements with higher radiation levels: 91
Proportion of measurements with higher radiation levels: 15.02%


###Interpretation
Out of 606 total radiation measurements, 84.98% (515 values) show negligible or minimal radiation (0 or 1), indicating mostly safe conditions. However, 15.02% (91 values) have higher radiation levels, which may require further investigation to identify potential risks or anomalies.

###Recommendations:
1. Focus Analysis on Higher Levels: Investigate the 15.02% of measurements with elevated radiation for potential risks or anomalies.
2. Monitor High-Risk Areas: Prioritize regular monitoring in locations with higher readings.
3. Validate Low Measurements: Confirm equipment accuracy for the 84.98% of low readings to ensure proper functioning.
4. Optimize Resources: Allocate resources efficiently, focusing on areas with higher radiation levels.


#Question 6
(8 points): Return the names of the States where i131_data has been above 2. Use ONLY ONE line of code.

Hint: It is ok if your output shows the same State repeated many times.

##Use Case Example: Identifying States with High Radiation
###Scenario:
You want to identify which states recorded high radiation levels (above 2) in your dataset. The output will help focus monitoring or investigations on specific states where elevated radiation was observed.

###Why This is Useful:
1. Targeted Action: Enables the prioritization of resources and attention on states with higher radiation.
2. Trend Analysis: Helps detect patterns in radiation levels across geographic locations.
3. Reporting: Provides specific states for communication and response planning.


In [72]:
# @title
# Filter the DataFrame to find states where I-131 values are greater than 2
# `loc` is used to retrieve rows where the condition is true and only the 'State' column is returned
states_high_radiation = df_assig1.loc[i131_data > 2, 'State']

# Output the states with I-131 values above 2
print(states_high_radiation)


54     AK
408    HI
411    MO
415    AR
417    CA
       ..
574    CO
579    CT
585    MI
598    WA
601    CA
Name: State, Length: 82, dtype: object


###Interpretation

1. High Radiation Levels:

  * There are 82 instances where I-131 radiation levels exceeded the threshold of 2, suggesting multiple cases of elevated radiation across different states.
2. States of Concern:

  * States like CA appear multiple times, indicating recurring high radiation levels in specific regions, which may require closer investigation.
  * States with fewer occurrences may reflect isolated or one-time events.
3. Geographic Spread:
  * The affected states are geographically diverse, suggesting no single concentrated area of concern but a need for broader monitoring.

###Recommendations:
1. Focus on Recurring States:
  * Prioritize monitoring and investigation in states like CA with repeated high-radiation measurements to identify trends or root causes.
2. Analyze Environmental Factors:
  * Investigate possible sources such as industrial activities, natural radiation, or equipment issues in states with multiple occurrences.
3. Monitor Isolated Cases:
  * Conduct follow-up checks in states with fewer instances to ensure these are not early indicators of emerging issues.
4. Enhance Monitoring:
  * Increase monitoring frequency or resolution in high-risk states to catch trends earlier and address potential risks proactively.

#Question 7
(8 points): Return how many times i131_data has a value of zero in California.

Hint: California is recorded as CA in the State colum.


##Use Case Example: Identifying Zero Radiation Instances in California
###Scenario:
A researcher is analyzing radiation levels in California to determine how many times the radiation (I-131) was recorded as zero (0). This helps in assessing the frequency of no radiation activity in the state.

###Why This is Useful:
1. Data Validation:
  * Ensures that monitoring equipment is functioning correctly by recording zero radiation where expected.
2. Assessing Safe Conditions:
  * Identifies how often radiation in California is negligible or non-existent, reflecting safe environmental conditions.
3. Comparing Trends:
  * Provides insights into whether California has more or fewer zero radiation levels compared to other states.

In [73]:
# @title
# Get the total size (number of measurements) in the dataset for California
total_ca_measurements = np.sum(df_assig1['State'] == 'CA')
print(f"Total number of measurements in California: {total_ca_measurements}")

# Count how many measurements have a radiation level of 0 in California
ca_zeros = np.sum((df_assig1['State'] == 'CA') & (i131_data == 0))
print(f"Number of measurements with zero radiation in California: {ca_zeros}")

# Calculate the proportion of measurements with zero radiation in California
proportion_ca_zeros = (ca_zeros / total_ca_measurements) * 100
print(f"Proportion of zero radiation measurements in California: {proportion_ca_zeros:.2f}%")

# Count how many measurements have radiation levels greater than 1 in California
ca_higher_levels = np.sum((df_assig1['State'] == 'CA') & (i131_data > 1))
print(f"Number of measurements with higher radiation levels in California: {ca_higher_levels}")

# Calculate the proportion of higher radiation levels in California
proportion_ca_higher = (ca_higher_levels / total_ca_measurements) * 100
print(f"Proportion of higher radiation measurements in California: {proportion_ca_higher:.2f}%")


Total number of measurements in California: 42
Number of measurements with zero radiation in California: 31
Proportion of zero radiation measurements in California: 73.81%
Number of measurements with higher radiation levels in California: 7
Proportion of higher radiation measurements in California: 16.67%


##Radiation Analysis Report for California
###Overview of Results:
1. Safe Radiation Levels (73.81%):
 * Out of 42 total radiation measurements in California, 31 recorded zero radiation levels. This indicates that the majority of the monitored areas are safe and free from radiation activity.
2. Elevated Radiation Levels (16.67%):
 * 7 measurements showed radiation levels above 1. These instances require further analysis to identify potential causes or anomalies.
3. Minimal Radiation Levels (11.52%):
 * The remaining measurements fall between 0 and 1, indicating low radiation levels that are not a cause for concern.

###Key Insights:
 * Most measurements confirm safe environmental conditions across the state.
 * A small portion of elevated readings may suggest localized issues or environmental changes that need further investigation.
M * inimal radiation levels (between 0 and 1) are consistent with expected background radiation.

###Recommendations:
1. Investigate Elevated Radiation Levels:
 * Analyze the 7 measurements above 1 to understand:
   * The locations of these readings.
   * The timing or events associated with these elevated levels.
   * Possible causes, such as environmental factors, industrial activity, or equipment issues.
2. Validate Zero Radiation Readings:
 * Verify that the 31 zero readings accurately reflect conditions. Ensure equipment calibration and data collection processes are functioning correctly.
3. Maintain Regular Monitoring:
 * Continue consistent radiation monitoring to track changes over time and detect future anomalies early.
4. Prioritize Resources:
 * Focus investigation efforts on areas with elevated radiation while maintaining general monitoring to ensure widespread safety.

###Conclusion:
The data indicates that California is predominantly safe, with the majority of measurements showing no radiation activity. However, the small number of elevated readings highlights the need for focused investigation to ensure there are no underlying risks or recurring issues. Regular monitoring and data validation remain critical to maintaining environmental safety.

#Question 8
(8 points): Write a print statement that returns this message: _The highest registered value of I-131 is:_ FILL IN THE BLANK

where, FILL IN THE BLANK is the actual highest registered value of i131_data.

##Use Case Example: Reporting the Highest Radiation Value
###Scenario:
A researcher wants to identify and report the highest radiation level recorded in the dataset (i131_data). This helps highlight the peak measurement, which could indicate a significant event or area of concern.

###Why This is Useful:
1. Identify Critical Events:
  * The highest radiation value often represents the most significant instance in the dataset, requiring attention.
2. Assess Data Extremes:
  * Helps verify whether the peak value aligns with expected patterns or is an anomaly.
3. Support Decision-Making:
  * Knowing the highest value can help prioritize areas for investigation or allocate resources effectively.


In [74]:
# @title
# Find the highest value in i131_data
highest_value = i131_data.max()

# Print the message with the highest value
print(f"The highest registered value of I-131 is: {highest_value}")


The highest registered value of I-131 is: 390


#Question 9
(8 points): Run the following code cell to create an array called q9_array.

In [75]:
# @title
q9_array = np.array ([13, 8, 3, 6, 8, 17, 2, 5, 10])

**9a)Create a new array called q9_array_sorted that contains the sorted values of q9_array.**

##Use Case Example: Sorting an Array
###Scenario:
A researcher has an array of radiation measurements (q9_array) and wants to sort the values in ascending order. The goal is to organize the data for better readability or further analysis, such as identifying trends or outliers.

###Why Sorting is Useful:
1. Identify Extremes:
  * Sorting makes it easy to find the smallest and largest values in the dataset.
2. Data Analysis:
  * Sorted data is often required for tasks like calculating percentiles, identifying clusters, or creating visualizations.
3. Improved Readability:
  * Organized data is easier to interpret and analyze.


In [76]:
# @title
# Sort the array and store it in a new array
q9_array_sorted = np.array(sorted(q9_array))

# Output the sorted array
q9_array_sorted


array([ 2,  3,  5,  6,  8,  8, 10, 13, 17])

**9b) Create a new array called q9_array_2dim that re-arranges the values of q9_array in a two-dimensional squared array (squared= with the same number of rows and columns).**

##Use Case Example: Reshaping an Array into a Square Matrix
###Scenario:
A researcher has a one-dimensional array (q9_array) and needs to reshape it into a two-dimensional square array (same number of rows and columns). This format is often required for advanced analysis, such as heatmaps or matrix computations.

###Why Reshaping is Useful:
1. Matrix Analysis:
  * Square matrices are commonly used in mathematical operations, like matrix multiplication or linear algebra applications.
2. Data Visualization:
  * Converting the data into a 2D format makes it easier to visualize patterns (e.g., heatmaps).
3. Logical Grouping:
  * Reshaping groups the data logically for comparisons between rows and columns.

**For Flexible Applications:** Use the dynamic dimensions code when dealing with arrays of varying sizes, as it’s more robust and adaptable

In [77]:
# @title
# Step 1: Calculate the total number of elements in the array
total_elements = len(q9_array)  # Total number of elements in q9_array

# Step 2: Calculate the size of each dimension for a square array
dimension_size = int(total_elements**0.5)  # Square root of the total elements gives the size of rows and columns

# Step 3: Reshape the array into a square 2D array
q9_array_2dim = q9_array.reshape(dimension_size, dimension_size)
# Reshape the 1D array into a matrix with 'dimension_size' rows and columns

# Step 4: Output the reshaped 2D array
print(q9_array_2dim)  # Display the 2D array to verify the result



[[13  8  3]
 [ 6  8 17]
 [ 2  5 10]]


**For Beginners or Fixed Data:** Use the static dimensions code (this one) when working with small, fixed-size arrays. It's simpler and easier to follow.

In [78]:
# @title
# Step 1: Define the dimensions for reshaping (manually specified)
rows = 3  # Number of rows in the 2D array
columns = 3  # Number of columns in the 2D array

# Step 2: Reshape the 1D array into a 2D array with the specified rows and columns
q9_array_2dim = q9_array.reshape(rows, columns)

# Step 3: Output the reshaped 2D array
print(q9_array_2dim)  # Display the 2D array to verify the result


[[13  8  3]
 [ 6  8 17]
 [ 2  5 10]]


#Question 10
(8 points): Use NumPy methods to generate the following arrays:

10 a) Generate an array with the numbers from 96 to 0, decreasing in steps of 8 (i.e., 96, 88, 80, ...., 0). Call this array a1.

10 b) Generate an array with the number 2 repeated as many times as the size of array a1. Call this array a2.

10 c) Add the arrays created in a) and b) and save the results in a3. Return the values of a3 (i.e., print the values of a3).

Note: To answer question 10 and create the arrays requested in a, b, and c, you __CANNOT__ enter the values of the arrays manually. You __must__ use the NumPy methods we learned in class to create them.

10 a) Generate an array with the numbers from 96 to 0, decreasing in steps of 8 (i.e., 96, 88, 80, ...., 0). Call this array a1.

In [79]:
# @title
# Step 1: Generate an array with numbers from 96 to 0, decreasing in steps of 8
a1 = np.arange(96, -1, -8)  # np.arange generates numbers starting at 96 (inclusive) and stops before -1 (exclusive).
                            # Since the question asks for 96 to 0 (inclusive), we use -1 as the stop value.
                            # The step value of -8 ensures the numbers decrease by 8 at each step.

# Step 2: Output the generated array to verify the result
print(a1)  # Display the array to ensure it contains [96, 88, 80, ..., 0]


[96 88 80 72 64 56 48 40 32 24 16  8  0]


###Use cases:

1. Financial Planning
* **Example:** A financial advisor models a scenario where a client’s savings decrease by 8 each month due to expenses, starting from $96.
* **Use Case:**
The sequence [96, 88, 80, ..., 0] simulates monthly balances, helping plan budgets or identify when funds will run out.
---

2. Fitness or Training Programs
* **Example:** A coach sets up a program where an athlete decreases their training time by 8 minutes each session, starting at 96 minutes.
* **Use Case:** The sequence [96, 88, 80, ..., 0] represents session durations, ensuring a gradual reduction over time.
---
3. Environmental Monitoring
* **Example:** An environmental scientist measures pollutant levels in a lake, which decrease at a constant rate of 8 units per week, starting from an initial reading of 96.
* **Use Case:** The sequence [96, 88, 80, ..., 0] models the reduction in pollutant levels over time, helping evaluate the effectiveness of cleanup efforts.
---
5. Inventory Management
* **Example:** A retailer tracks the reduction in stock levels for a product that decreases by 8 units each week, starting from 96.
* **Use Case:** The sequence [96, 88, 80, ..., 0] predicts when the stock will run out, helping schedule restocking.
---
###Summary
This type of sequence generation is valuable for:

* Modeling predictable patterns.
* Simulating stepwise changes.
* Planning and scheduling tasks or resources. By creating structured sequences, you can simplify calculations, analyze trends, and make better predictions in various fields


10 b) Generate an array with the number 2 repeated as many times as the size of array a1. Call this array a2.

In [80]:
# @title
# Step 1: Get the size of array a1
size_of_a1 = len(a1)  # len(a1) returns the total number of elements in array a1

# Step 2: Generate an array with the number 2 repeated as many times as the size of a1
a2 = np.full(size_of_a1, 2)  # np.full creates an array of specified size filled with the specified value (2 in this case)

print(a2)


[2 2 2 2 2 2 2 2 2 2 2 2 2]


###Use Cases

1. Initializing Inventory (Retail/Logistics)
* **Example:** A store restocks an item with a default minimum quantity of 2 across multiple locations.
* **Use Case:** An array [2, 2, 2, ...] can represent the initial inventory levels for each store in a chain.
---
2. Placeholder Values for Simulations (Data Analysis)
* **Example:** A data analyst runs a simulation where a variable starts at the same value (2) for all data points.
* **Use Case:** The array [2, 2, 2, 2, 2] is used as a baseline input for each iteration of the simulation.
---
3. Repeated Alerts or Thresholds (Monitoring Systems)
* **Example:** A monitoring system uses a threshold of 2 units for triggering alerts across all sensors.
* **Use Case:** The array [2, 2, 2, ...] stores the default threshold for all sensors, ensuring consistency in monitoring.
---
7. Budget Initialization (Finance)
Example: A financial planner allocates an initial placeholder budget of $2,000 for each department in a company.
Use Case: The array [2000, 2000, 2000, ...] represents initial allocations before adjustments.
---
###Summary
This concept of generating repeated values is widely applicable in:

* Data preparation
* Simulation and modeling
* System initialization
* Default values setup

10 c) Add the arrays created in a) and b) and save the results in a3. Return the values of a3 (i.e., print the values of a3).

In [81]:
a3=a1+a2
print(a3)

[98 90 82 74 66 58 50 42 34 26 18 10  2]


###Use Cases

1. Adjusting Inventory Levels (Retail/Logistics)
* **Example:** A store tracks current stock levels (a1) and needs to account for a fixed restock amount (a2), such as 2 units per product.
* **Use Case:**
a1 = [96, 88, 80] (current inventory levels)
a2 = [2, 2, 2] (restock per item)
* **Result (a3):** [98, 90, 82] (updated inventory levels after restocking).
---
2. Incremental Growth Modeling (Finance)
* **Example:** A financial planner models monthly savings where a base amount is added to current balances.
* **Use Case:**
a1 = [96, 88, 80] (current balances)
a2 = [2, 2, 2] (monthly deposit)
* **Result (a3):** [98, 90, 82] (balances after deposits).
---
3. Scaling Energy Consumption (Environmental Monitoring)
* **Example:** Energy usage (a1) across locations is adjusted with additional demand (a2) for each site.
* **Use Case:**
a1 = [96, 88, 80] (current energy usage in kWh)
a2 = [2, 2, 2] (additional demand per location)
* **Result (a3):** [98, 90, 82] (total adjusted energy usage).
---
4. Updating Base Pay Rates (Human Resources)
* **Example:** A company increases employee salaries by a fixed amount (a2) across departments.
* **Use Case:**
a1 = [96, 88, 80] (current salaries)
a2 = [2, 2, 2] (pay raise for all employees)
* **Result (a3):** [98, 90, 82] (updated salaries).
---
5. Sensor Calibration (IoT/Technology)
* **Example:** Sensor readings are adjusted with a fixed calibration offset (a2) to improve accuracy.
* **Use Case:**
a1 = [96, 88, 80] (raw sensor readings)
a2 = [2, 2, 2] (calibration offsets)
* **Result (a3):** [98, 90, 82] (calibrated readings).
---
###Summary
The addition of two arrays is a powerful tool that applies to scenarios requiring adjustments, updates, or incremental changes to existing data. It simplifies calculations by automating element-wise operations.


#Question 11
 (5 points): Return how many observations were taken at each State. __Write only one line of code__ to answer this question.

In [82]:
# @title
# Step 1: Standardize the 'State' column to remove duplicates caused by inconsistencies
df_assig1['State'] = df_assig1['State'].str.strip().str.title() # Cleans the column by removing spaces and standardizing capitalization
# .str.strip() removes any leading or trailing spaces.
# .str.title() ensures consistent capitalization.

# Step 2: Count occurrences of each state and sort by state name
state_counts = df_assig1['State'].value_counts().sort_index()  # Counts occurrences and sorts alphabetically by state name

# Step 3: Display the counts
print("Occurrences of each state, ordered by name:")
print(state_counts)


Occurrences of each state, ordered by name:
State
Ak      71
Al      20
Ar      10
Az       3
Ca      42
Cnmi    19
Co       6
Ct      11
De       4
Fl      11
Ga       8
Guam    24
Hi      49
Ia       4
Id      29
Il       4
Ks       9
Ky       2
La       2
Ma      10
Md       6
Mi       9
Mn       8
Mo       4
Ms       4
Mt       2
Nc      12
Nd       2
Ne       2
Nh      10
Nj       5
Nm       4
Nv      23
Ny      22
Oh      19
Ok       2
Or       6
Pa      16
Ri       2
Sc       8
Tn      56
Tx       7
Ut       9
Va       6
Vt       1
Wa      19
Wi       2
Wv       2
Name: count, dtype: int64


In [83]:
# @title
# Step 2: Count occurrences of each state and sort by state name
state_counts = df_assig1['State'].value_counts()

# Step 3: Display the counts
print("Occurrences of each state, ordered by name:")
print(state_counts)


Occurrences of each state, ordered by name:
State
Ak      71
Tn      56
Hi      49
Ca      42
Id      29
Guam    24
Nv      23
Ny      22
Al      20
Wa      19
Oh      19
Cnmi    19
Pa      16
Nc      12
Ct      11
Fl      11
Ar      10
Nh      10
Ma      10
Mi       9
Ks       9
Ut       9
Mn       8
Ga       8
Sc       8
Tx       7
Va       6
Or       6
Md       6
Co       6
Nj       5
De       4
Ms       4
Mo       4
Ia       4
Nm       4
Il       4
Az       3
Ok       2
Ri       2
Wi       2
Nd       2
Ne       2
Mt       2
Wv       2
Ky       2
La       2
Vt       1
Name: count, dtype: int64


#Question 12
(5 points): Return the __df_assig1__ data frame sorted by the 'Date Collected'. Show the latest collected observations first. __Write only one line of code__ to answer this question.

In [84]:
# @title
# Step 1: Sort the DataFrame by the 'Date Collected' column in descending order
sorted_df = df_assig1.sort_values(by='Date Collected', ascending=False)
# .sort_values() is used to sort a DataFrame by the values in a specific column.
# by='Date Collected' specifies the column to sort.
# ascending=False sorts the values in descending order, showing the latest dates first.

# Step 2: Display the sorted DataFrame
print(sorted_df)  # Prints the sorted DataFrame to verify that the latest dates appear first.


    State     Location Date Posted Date Collected     Sample Type    Unit  \
577    Pa   Harrisburg  2011-05-24     2011-04-30  Precipitation    pCi/l   
570    Nc    Charlotte  2011-05-24     2011-04-29  Precipitation    pCi/l   
584    Tn    Knoxville  2011-05-24     2011-04-29  Precipitation    pCi/l   
568    Ma       Boston  2011-05-24     2011-04-29  Precipitation    pCi/l   
599    Oh  Painesville  2011-05-24     2011-04-29  Precipitation    pCi/l   
..    ...          ...         ...            ...             ...     ...   
539    Oh  Painesville  2011-04-04     2011-03-15   Precipitation   pCi/l   
510    Tn    Nashville  2011-04-13     2011-03-15   Precipitation   pCi/l   
213    Ca    Riverside  2011-03-30     2011-03-15      Air Filter  pCi/m3   
543    Ca     Richmond  2011-04-04     2011-03-15   Precipitation   pCi/l   
24     Ca      Anaheim  2011-03-30     2011-03-11      Air Filter  pCi/m3   

     Ba-140  Co-60  Cs-134  Cs-136  Cs-137  I-131  I-132  I-133  Te-129  \


#Question 13
(5 points): Return the __df_assig1__ data frame only showing the samples taken in Miami. Miami is one of the values from the column 'Location'. __Write only one line of code__ to answer this question.

In [85]:
# @title
# Step 1: Filter the DataFrame to only include rows where 'Location' is 'Miami'
miami_samples = df_assig1[df_assig1['Location'] == 'Miami']
# df_assig1['Location'] == 'Miami' creates a boolean mask that is True for rows where the 'Location' column equals 'Miami'.
# df_assig1[...] applies this mask to filter the rows where the condition is True.

# Step 2: Display the filtered DataFrame
print(miami_samples)  # Prints the filtered DataFrame to show only the samples taken in Miami.



    State Location Date Posted Date Collected     Sample Type   Unit  Ba-140  \
312    Fl    Miami  2011-04-08     2011-03-29  Drinking Water  pCi/l     0.0   

     Co-60  Cs-134  Cs-136  Cs-137  I-131  I-132  I-133  Te-129  Te-129m  \
312    0.0       0     0.0       0      0      0    0.0     0.0      0.0   

     Te-132  Ba-140.1  
312       0       NaN  


#Question 14
(5 points): Return the average concentration of 'I-131' found on each state. __Write only one line of code__ to answer this question.

In [86]:
# @title
# Step 1: Group the data by 'State' and calculate the mean of 'I-131'
average_concentration = df_assig1.groupby('State')['I-131'].mean()
# .groupby('State') groups the data by the 'State' column.
# ['I-131'] selects the 'I-131' column for calculations.
# .mean() calculates the average (mean) concentration of 'I-131' for each state.

# Step 2: Display the results
print(average_concentration)  # Prints the average 'I-131' concentration for each state.


State
Ak       0.154930
Al       1.050000
Ar       5.900000
Az       1.000000
Ca       7.119048
Cnmi     0.105263
Co       8.333333
Ct       5.454545
De       0.000000
Fl      15.636364
Ga       2.125000
Guam     0.041667
Hi       0.591837
Ia       0.000000
Id      23.068966
Il       0.000000
Ks      24.444444
Ky       0.000000
La       0.000000
Ma      14.400000
Md       0.000000
Mi       2.777778
Mn       7.125000
Mo       0.750000
Ms       0.000000
Mt       0.000000
Nc       7.500000
Nd       0.000000
Ne       0.000000
Nh       7.100000
Nj       0.000000
Nm       0.000000
Nv       0.173913
Ny       3.227273
Oh       2.789474
Ok       0.000000
Or      14.500000
Pa       0.750000
Ri       0.000000
Sc       0.000000
Tn       7.535714
Tx       0.000000
Ut      24.444444
Va       1.500000
Vt       0.000000
Wa      10.368421
Wi       0.000000
Wv       0.000000
Name: I-131, dtype: float64
