In [8]:
# Task 2: Data Transformation & Feature Engineering
# Goal: Create "Solar Access Score" to prioritize regions for solar deployment

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

prime_data = pd.read_csv("C:/Users/HP/Downloads/PrimeFrontier_SolarDeploymentDataset.csv")


print("Dataset overview:")
print(prime_data.info())
print(prime_data.head())

features_to_scale = ['Solar_Irradiance_kWh_m2_day', 'Grid_Access_Percent', 'Infrastructure_Index', 'Electricity_Cost_USD_per_kWh']
scaler = MinMaxScaler()
scaled_values = scaler.fit_transform(prime_data[features_to_scale])

prime_data_scaled = pd.DataFrame(scaled_values, columns=(f'{col}_Scaled' for col in features_to_scale))
prime_data = pd.concat([prime_data, prime_data_scaled], axis=1)

# Computing Solar Access Score:
# Weighting logic:
# Solar Irradiance (35%), Inverse Grid Access (25%), Infrastructure Index (20%), Electricity Cost (20%)
prime_data['Solar_Access_Score'] = (
    0.35 * prime_data['Solar_Irradiance_kWh_m2_day_Scaled'] +
    0.25 * (1 - prime_data['Grid_Access_Percent_Scaled']) +  # inverse because low grid access is prioritized
    0.20 * prime_data['Infrastructure_Index_Scaled'] +
    0.20 * prime_data['Electricity_Cost_USD_per_kWh_Scaled']
)


prime_data_sorted = prime_data.sort_values(by='Solar_Access_Score', ascending=False)

print("Top 10 regions by Solar Access Score:")
print(prime_data_sorted[['Region', 'Solar_Access_Score']].head(10))









Dataset overview:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Region                        50 non-null     object 
 1   Solar_Irradiance_kWh_m2_day   50 non-null     float64
 2   Rural_Pop_Density_per_km2     50 non-null     int64  
 3   Grid_Access_Percent           50 non-null     float64
 4   Infrastructure_Index          50 non-null     float64
 5   Electricity_Cost_USD_per_kWh  50 non-null     float64
 6   Terrain_Ruggedness_Score      50 non-null     float64
dtypes: float64(5), int64(1), object(1)
memory usage: 2.9+ KB
None
     Region  Solar_Irradiance_kWh_m2_day  Rural_Pop_Density_per_km2  \
0  Region_1                         6.00                         90   
1  Region_2                         5.36                        206   
2  Region_3                         6.15                         64  

# Solar Access Score: Business Rationale

The “Solar Access Score” is a composite metric created to identify and prioritize West African regions best suited for solar energy deployment based on available data.

This score combines multiple factors with assigned weights to capture both the potential for solar energy generation and the socio-economic conditions impacting feasibility:

- **Solar Irradiance (35%)**: Measured in kWh/m²/day, this is the key environmental factor driving solar power potential. Higher irradiance values mean more available solar energy.
- **Inverse Grid Access (25%)**: Grid access percentage indicates how connected a region is to the electrical grid. Regions with lower grid access (below 50%) have a greater need for alternative energy solutions like solar, so we use the inverse of this metric to prioritize them.
- **Infrastructure Index (20%)**: A score between 0 and 1 that represents the quality of infrastructure necessary for deploying and maintaining solar installations. Better infrastructure supports easier project implementation.
- **Electricity Cost (20%)**: The cost of electricity in USD per kWh affects the economic attractiveness of solar energy. Higher electricity costs make solar installations more financially viable and attractive.

By combining these normalized and weighted features into a single Solar Access Score, we create a balanced indicator highlighting locations where solar deployment would have the highest technical potential and social impact.

---

## Summary of Dataset

- Number of regions: 50
- Key features include:
  - Solar Irradiance (kWh/m²/day)
  - Rural Population Density (per km²)
  - Grid Access (% of population)
  - Infrastructure Index (0–1 scale)
  - Electricity Cost (USD/kWh)
  - Terrain Ruggedness Score

---

This score supports data-driven decision-making to efficiently allocate renewable energy investments in regions where they will be most beneficial.



In [None]:
# Save the enhanced dataset to a CSV
prime_data.to_csv("PrimeFrontier_SolarDeploymentDataset_Scored.csv", index=False)
