## Reading the DHS + Nightlights dataset

**Incase DHS + Nightlights data isn't generated,** please follow the steps [below](#generating_dataset) for generating the dataset.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
dhs_nightlights_df = pd.read_csv("../data/DHS_Nightlights/DHS_Nightlights_data.csv")

In [None]:
dhs_nightlights_df.head()

In [None]:
sns.regplot(x = "Mean_nightlight", y = "Wealth Score", data = dhs_nightlights_df, color = 'blue');
plt.xlabel("Average Nighttime Luminous Intensity");
plt.ylabel("Average Wealth of Clusters");
# plt.savefig('avgwealth_luminosity.png')

## Running EDA using Pandas profiling

In [None]:
import pandas_profiling

In [None]:
dhs_nightlights_df.profile_report()

**Pandas profile report and the regplot suggests there is a correlation between the wealth score and nightlight intensity**

<a id='generating_dataset'></a>

## Generating DHS + Nightlights dataset

Importing DHS Survey data and nightlights features

In [None]:
# Magic commands to enable autoreload of imported packages
%load_ext autoreload
%autoreload 2

In [None]:
from poverty_predictor.get_dhsdata import GetDHSData
from poverty_predictor.merge_dataframes import merge_dhs, merge_night_dhs

Fetching DHS data

In [None]:
data = GetDHSData('../data/GPS/Malawi/MWGE7AFL.shp', '../data/Survey/Malawi/MWHR7AFL.DTA')
gps_data = data.gps_df()
survey_data = data.survey_df()

In [None]:
dhs_data = merge_dhs(gps_data, survey_data)

Creating nightlight features and combining these features with DHS data

In [None]:
tiff_path = "../data/Nightlights/F182010.v4d_web.stable_lights.avg_vis.tif"
nightlight_dhs_df = merge_night_dhs(dhs_data, tiff_path)

Saving the combined dataframe in csv format.

In [None]:
nightlight_dhs_df.to_csv("../data/DHS_Nightlights/Malawi/DHS_Nightlights_data.csv", index=False)