# Part 4: Feature Engineering and Data Integration

## Introduction

### Scenario

Your initial assessment has successfully identified a high-priority search area. Command has redirected search teams to this new zone, and a second batch of field reports has just arrived. Your mission is to integrate this new data with the first batch and hopefully narrow the search location even more. 

### Coding Task Overview

You will import both of your functions from your utility file. Your task is to load and clean the second batch of data, then combine it with the cleaned data from the first batch. Finally, you will re-run your EDA function on this final, augmented dataset to produce an updated assesment of the likely focused region to narrow the search.

### Mission Deliverable

An Updated Assessment Briefing. You must produce the visualizations from your EDA function on the combined dataset and write a markdown summary defining a tight search region and describe the evidence used to support your assessment


In [None]:
# Import pandas, numpy
import pandas as pd
import numpy as np
# From our utility file, import both the clean_data and perform_eda functions
from uav_analysis_tools import clean_data, perform_eda 
from uav_mapping_tools import EDA_makeMap  

import os



### STEP 1: Load All Datasets

STUDENT CODE REQUIRED

In this step you will 
1.  use pandas ```read.csv()``` to load your previously cleaned dataset CSV into a dataframe ```df_batch1```
2.  use pandas ```read.csv()``` toload the new data ```field_reports_batch_2.csv``` into a dataframe ```df_batch2_raw```
3.  use pandas ```.info()``` method to get an overview of the new raw data

remember that these data files are stored in the ```data``` subdierctory so you will need the ```os``` 
package's ```os.path.join()``` function to create the proper load location for pandas' CSV reading functions

In [None]:
DATA_DIR = None #placeholder for the data directory
datafilepath = None #placeholder for the data file path 
datafilepath2 = None #placeholder for the second data file path

df_batch1 = None # placeholder
df_batch2_raw = None  # placeholder

# Load the cleaned_reports_batch_1.csv file into a DataFrame called df_batch1.
# Load the new messy field_reports_batch_2.csv file into a DataFrame called df_batch2_raw.
# Display the .info() for the new raw data to see its messy state.
##########################################################
##### START STUDENT CODE HERE:


#Set the DATA_DIR variable to 'data'
DATA_DIR = None # placeholder

#create the datafilepath variable by joining DATA_DIR and 'cleaned_reports_batch_1.csv' using os.path.join()
datafilepath = None # placeholder
df_batch1 = None # placeholder

datafilepath2 = None # placeholder
df_batch2_raw = None # placeholder

#display the .info() for the new raw data to see its messy state.
 

##### END STUDENT CODE HERE
##########################################################


### STEP 2: Clean the New Data Batch

STUDENT CODE REQUIRED

```uav_analysis_tools.py``` was imported earlier in this notebook.  Run the ```clean_data``` function from the ```uav_analysis_tools.py``` that you wrote previously 
and store the result in a dataframe called ```df_batch2_clean```.   Then use pandas ```.info()``` method to get the info on the cleaned dataset so that
you can compare the cleaned and raw datasets

In [None]:
# Use your imported clean_data function to clean the df_batch2_raw DataFrame.
# Store the result in a new DataFrame called df_batch2_clean.
# Display the .info() for the newly cleaned data to verify the result.

##########################################################
##### START STUDENT CODE HERE:

df_batch2_clean = None # placeholder

#display the .info() for the newly cleaned data to verify the result.


##### END STUDENT CODE HERE
##########################################################



### STEP 3: Combine the Datasets

STUDENT CODE REQUIRED

Use ```pd.concat()``` to combine ```df_batch1``` and ```df_batch2_clean``` into a single DataFrame called ```df_combined```.
Be sure to ignore the index to create a new, clean index using ```ignore_index=True```.
Print the shape of all three DataFrames to confirm the concatenation was successful using the DataFrame's built in ```.shape``` accessor.

In [None]:
# Use pd.concat to combine df_batch1 and df_batch2_clean into a single DataFrame
# called df_combined.
# Be sure to ignore the index to create a new, clean index.
# Print the shape of all three DataFrames to confirm the concatenation was successful.
df_combined = pd.concat([df_batch1, df_batch2_clean], ignore_index=True)
print(f"Batch 1 shape: {df_batch1.shape}")
print(f"Batch 2 shape: {df_batch2_clean.shape}")
print(f"Combined shape: {df_combined.shape}")   

##########################################################
##### START STUDENT CODE HERE:

df_combined = None # placeholder

##### END STUDENT CODE HERE
##########################################################

#check to see that the combined data's shape is the sum of the two individual dataframes and print the result of whether the check succeeds or not
expected_rows = df_batch1.shape[0] + df_batch2_clean.shape[0]
actual_rows = df_combined.shape[0]
if expected_rows == actual_rows:
    print("Concatenation successful: row counts match.")
else:
    print("Concatenation error: row counts do not match.")



### STEP 5: Map the locations of the reports in the engineered dataset

No Student Code Required

In [None]:
# Generate the map visualizations using the EDA_makeMap function on the df_engineered DataFrame

m = EDA_makeMap(df_combined,force_regenerate=True)
display(m)

### STEP 5b: Run Exploratory Data Analysis (EDA) on Engineered Dataset

STUDENT CODE REQUIRED

In this step you will apply your imported ```perform_eda()``` function to the final ```df_engineered``` DataFrame.
The target column is still 'signal_strength'

The function should automatically detect and plot the new engineered features.

In [None]:
# Apply your imported perform_eda function to the final df_engineered DataFrame.
# The target column is still 'signal_strength'.
# The function should automatically detect and plot the new engineered features.

##########################################################
##### START STUDENT CODE HERE:

perform_eda(df_combined, 'signal_strength')   

##### END STUDENT CODE HERE
##########################################################





### STEP 6: Mission Deliverable - Updated Assessment Briefing

STUDENT MARKDOWN REQUIRED

In the Markdown cell below write a short mission assessment where you provide 
1.  An overview of the assessment process
2.  Evidence for your assessment based on the map and other EDA information
3.  A bounding box that includes your guess about where the rescue beacon is located (lat/long coordinates of two opposing corners of the box)



Note that clicking on the interactive map from Step 5a will provide lat/long coordinates that can be used to define the bounding box


# Updated Assessment Briefing

**TO:** Mission Command
**FROM:** Data Analysis Unit
**SUBJECT:** Updated Search Area Recommendation Based on Integrated Field Data (Batch 1 & 2)

### 1. Assessment Overview

<span style="color: green;">

STUDENT TEXT HERE

</span>



</span>


### 2. Evidence

<span style="color: green;">

STUDENT TEXT HERE

</span>



### 3. Final Recommendation for refined search area



<span style="color: green;">

STUDENT TEXT HERE

</span>




### STEP 7: Save the Final Combined Dataset

No Student Code Required

In [None]:
# Save the df_engineered DataFrame to a new CSV file named 
# 'final_combined_data.csv'.
# Do not include the pandas index.
# Print a confirmation message.


#Set the DATA_DIR variable to 'data'
DATA_DIR = 'data'

#create the datafilepath variable by joining DATA_DIR and 'cleaned_reports_batch_1.csv' using os.path.join()
datafilepath = os.path.join(DATA_DIR, 'final_combined_data.csv')
df_combined.to_csv(datafilepath, index=False)

print("Final combined data saved to ", datafilepath)
