# Development of SDG Matrix for PCBS

The may objective of the UNSD/DFID project technical mission to PCBS (from 31 March to 4 April 2019) is to support PCBS in setting up:
- A working prototype of an SDG data platform 
- A workflow to update and maintain the SDG data platform prototype using, as much as possible, automation tools

**Deliverables:**
1. A national SDG indicators matrix
   - Indicator name
   - Mapping to global SDG indicator framework
   - Source
   - Availability
2. A national SDG data structure definition
   - Indicator code
   - Indicator description
   - Available disaggregations (dimensions)
   - Available attributes 
3. Data collection and validation templates
4. A repository of available SDG data,  including:
   - A master version of the national DSD and Code Lists
   - Set of SDG indicator tables following the national DSD
   - Pivot/extended views of the SDG indicator tables, suitable for data dissemination
5. A repository of boundary data for the available geographic disaggregations 
6. A repository of core metadata files and graphic design assets
7. A repository of python scripts for dataflow automation, which may include:
   - Data pre-processing
   - Creation of pivot/extended views of the SDG indicator tables for dissemination
   - Linking of SDG indicator tables with geographic boundaries and publication into ArcGIS online


## 1. Setup of working environment for this notebook

In [17]:
import pandas as pd
import os 

dir_path = os.path.dirname(os.path.realpath('__file__'))
print(dir_path)


input_dir = r'../SDG-Matrix/Input/'
print('data inputs dir: ' + input_dir)

output_dir = r'../SDG-Matrix/Output/'
print('outputs dir: ' + output_dir)


C:\Users\L.GonzalezMorales\Documents\GitHub\PCBS\notebooks
data inputs dir: ../SDG-Matrix/Input/
outputs dir: ../SDG-Matrix/Output/


## 2. Develop SDG Matrix template

## 2.1 Read `SDGsMatrix.csv` file provided by PCBS 

The `SDGsMatrix.csv` file contains the list of all SDG indicators

In [18]:
sdgMatrix_df = pd.read_csv(input_dir + 'SDGsMatrix.csv', encoding='UTF-8', 
                              dtype={'Goal_ID':str,
                                     'Goal_DescEn':str,
                                     'Goal_DescAr':str,
                                     'Target_ID':str,
                                     'Target_DescEn':str,
                                     'Target_DescAr': str,
                                     'Indicator_NL': str,
                                     'Indicator_Code': str,
                                     'Indicator_descEn': str,
                                     'Indicator_descAr': str,
                                     'Subindicator_Code': str, 
                                     'Subindicator_DescEn': str,
                                     'Subindicator_DescAr': str })
sdgMatrix_df.head(5)

Unnamed: 0,Goal_ID,Goal_DescEn,Goal_DescAr,Target_ID,Target_DescEn,Target_DescAr,Indicator_NL,Indicator_Code,Indicator_descEn,Indicator_descAr,...,InternetSpeed_DescEn,InternetSpeed_DescAr,OBS_VALUE,UNIT_MULT,UNIT_MEASURE,OBS_STATUS,TIME_DETAIL,COMMENT_OBS,BASE_PER,SOURCE_DETAIL
0,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,
1,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,
2,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,
3,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,
4,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,


*Check number of rows and columns:*

In [19]:
print(sdgMatrix_df.shape)
print(sdgMatrix_df.columns)

(47439, 76)
Index(['Goal_ID', 'Goal_DescEn', 'Goal_DescAr', 'Target_ID', 'Target_DescEn',
       'Target_DescAr', 'Indicator_NL', 'Indicator_Code', 'Indicator_descEn',
       'Indicator_descAr', 'Subindicator_Code', 'Subindicator_DescEn',
       'Subindicator_DescAr', 'TIME_PERIOD', 'REF_AREA', 'REF_AREA_DESC_EN',
       'REF_AREA_DESC_AR', 'SEX', 'SEX_DESC_EN', 'SEX_DESC_AR', 'AGE',
       'Age_DescEn', 'Age_DescAr', 'URBANISATION', 'Urbanisation_DescEn',
       'Urbanisation_DescAr', 'EDUCATION_LEV', 'Education_DescEn',
       'Education_DescAr', 'DISABILITY_STATUS', 'DisabilityStatus_DescEn',
       'DisabilityStatus_DescAr', 'OCCUPATION', 'Occupation_DescEn',
       'Occupation_DescAr', 'Sector', 'Sector_DescEn', 'Sector_DescAr',
       'AreaOfStudy', 'AreaOfStudy_DescEn', 'AreaOfStudy_DescAr',
       'TypeOfViolence', 'TypeOfViolence_DescEn', 'TypeOfViolence_DescAr',
       'EmploymentStatus', 'EmploymentStatus_DescEn',
       'EmploymentStatus_DescAr', 'PregnancyStatus', 'Pregnan

## 2.2 Identify dimensions

The template includes 18 coded dimensions (in addition to **`TIME_PERIOD`**), namely:
- **`REF_AREA`**: Reference area (geographic area)
- **`SEX`**: Male/female
- **`URBANISATION`**: Urban/rural
- **`EDUCATION_LEV`**: Education level
- **`DISABILITY_STATUS`**: Disability status
- **`OCCUPATION`**: Occupation
- **`SECTOR`**: Economic activity
- **`STUDY_AREA`**: Area of study
- **`VIOLENCE_TYPE`**: Type of violence
- **`EMPLOYMENT_STATUS`**: Employment status
- **`PREGNANCY_STATUS`**: Pregnancy status
- **`WORKING_INJURY_STATUS`**: Working injury status
- **`POVERTY_STATUS`**: Poverty status
- **`SCHOOLING_YEARS`**: Years of schooling
- **`MOBILE_NETWORK_TECHNOLOGY`**: Mobile network technology
- **`ECOSYSTEM_TYPE`**: Type of ecosystem
- **`INTERNET_SPEED`**: Internet speed

In addition to the dimension codes, the matrix also includes English and Arabic description for each of them.  


For the sake of standardization, the dimension columns of the draft matrix will be written in all-capital letters:

In [20]:

sdgMatrix_df = sdgMatrix_df.rename(columns = {'Age_DescEn':'AGE_DESC_EN', 
                                                    'Age_DescAr':'AGE_DESC_AR',
                                                    'Urbanisation_DescEn': 'URBANISATION_DESC_EN', 
                                                    'Urbanisation_DescAr': 'URBANISATION_DESC_AR',
                                                    'Education_DescEn': 'EDUCATION_LEV_DESC_EN', 
                                                    'Education_DescAr': 'EDUCATION_LEV_DESC_AR',
                                                    'DisabilityStatus_DescEn': 'DISABILITY_STATUS_DESC_EN', 
                                                    'DisabilityStatus_DescAr': 'DISABILITY_STATUS_DESC_AR',
                                                    'Occupation_DescEn': 'OCCUPATION_DESC_EN', 
                                                    'Occupation_DescAr': 'OCCUPATION_DESC_AR',
                                                    'Sector': 'SECTOR', 
                                                    'Sector_DescEn': 'SECTOR_DESC_EN', 
                                                    'Sector_DescAr': 'SECTOR_DESC_AR',
                                                    'AreaOfStudy': 'STUDY_AREA', 
                                                    'AreaOfStudy_DescEn': 'STUDY_AREA_DESC_EN', 
                                                    'AreaOfStudy_DescAr': 'STUDY_AREA_DESC_AR',
                                                    'TypeOfViolence': 'VIOLENCE_TYPE', 
                                                    'TypeOfViolence_DescEn': 'VIOLENCE_TYPE_DESC_EN', 
                                                    'TypeOfViolence_DescAr': 'VIOLENCE_TYPE_DESC_AR',
                                                    'EmploymentStatus': 'EMPLOYMENT_STATUS', 
                                                    'EmploymentStatus_DescEn': 'EMPLOYMENT_STATUS_DESC_EN', 
                                                    'EmploymentStatus_DescAr': 'EMPLOYMENT_STATUS_DESC_AR', 
                                                    'PregnancyStatus': 'PREGNANCY_STATUS', 
                                                    'PregnancyStatus_DescEn': 'PREGNANCY_STATUS_DESC_EN', 
                                                    'PregnancyStatus_DescAr': 'PREGNANCY_STATUS_DESC_AR',
                                                    'WorkingInjuryStatus': 'WORKING_INJURY_STATUS', 
                                                    'WorkingInjuryStatus_DescEn': 'WORKING_INJURY_STATUS_DESC_EN', 
                                                    'WorkingInjuryStatus_DescAr': 'WORKING_INJURY_STATUS_DESC_AR',
                                                    'PovertyStatus': 'POVERTY_STATUS', 
                                                    'PovertyStatus_DescEn': 'POVERTY_STATUS_DESC_EN', 
                                                    'PovertyStatus_DescAr': 'POVERTY_STATUS_DESC_AR',
                                                    'YearsOfSchooling': 'SCHOOLING_YEARS', 
                                                    'YearsOfSchooling_DescEn': 'SCHOOLING_YEARS_DESC_EN', 
                                                    'YearsOfSchooling_DescAr': 'SCHOOLING_YEARS_DESC_AR', 
                                                    'MobileNetworkTechnology': 'MOBILE_NETWORK_TECHNOLOGY', 
                                                    'MobileNetworkTechnology_DescEn': 'MOBILE_NETWORK_TECHNOLOGY_DESC_EN', 
                                                    'MobileNetworkTechnology_DescAr': 'MOBILE_NETWORK_TECHNOLOGY_DESC_AR',
                                                    'EcosystemType': 'ECOSYSTEM_TYPE', 
                                                    'EcosystemType_DescEn': 'ECOSYSTEM_TYPE_DESC_EN', 
                                                    'EcosystemType_DescAr': 'ECOSYSTEM_TYPE_DESC_AR',
                                                    'InternetSpeed': 'INTERNET_SPEED', 
                                                    'InternetSpeed_DescEn': 'INTERNET_SPEED_DESC_EN', 
                                                    'InternetSpeed_DescAr': 'INTERNET_SPEED_DESC_AR'})

Use capital letter for all other column headings:

In [21]:
sdgMatrix_df = sdgMatrix_df.rename(columns = {'Goal_ID': 'GOAL_ID', 
                                              'Goal_DescEn': 'GOAL_DESC_EN', 
                                              'Goal_DescAr': 'GOAL_DESC_AR',
                                              'Target_ID': 'TARGET_ID', 
                                              'Target_DescEn': 'TARGET_DESC_EN',
                                              'Target_DescAr': 'TARGET_DESC_AR', 
                                              'Indicator_NL': 'INDICATOR_LABEL', 
                                              'Indicator_Code': 'INDICATOR_ID', 
                                              'Indicator_descEn': 'INDICATOR_DESC_EN',
                                              'Indicator_descAr': 'INDICATOR_DESC_AR', 
                                              'Subindicator_Code': 'SUBINDICATOR_ID',
                                              'Subindicator_DescEn': 'SUBINDICATOR_DESC_EN',
                                              'Subindicator_DescAr': 'SUBINDICATOR_DESC_AR'})

In [24]:
sdgMatrix_df.head(5)

Unnamed: 0,GOAL_ID,GOAL_DESC_EN,GOAL_DESC_AR,TARGET_ID,TARGET_DESC_EN,TARGET_DESC_AR,INDICATOR_LABEL,INDICATOR_ID,INDICATOR_DESC_EN,INDICATOR_DESC_AR,...,INTERNET_SPEED_DESC_EN,INTERNET_SPEED_DESC_AR,OBS_VALUE,UNIT_MULT,UNIT_MEASURE,OBS_STATUS,TIME_DETAIL,COMMENT_OBS,BASE_PER,SOURCE_DETAIL
0,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,
1,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,
2,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,
3,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,
4,1,No poverty,القضاء على الفقر,1.1,"By 2030, eradicate extreme poverty for all peo...",القضاء على الفقر المدقع للناس أجمعين أينما كان...,1.1.1,C010101,Proportion of population below the internatio...,نسبة السكان الذين يعيشون دون خط الفقر الدولي ب...,...,,,,,,,,,,


In [32]:
sdgMatrix_subindicators = sdgMatrix_df.copy()

sdgMatrix_subindicators = sdgMatrix_subindicators.drop(['TIME_PERIOD'], axis=1).drop_duplicates

print(sdgMatrix_subindicators)


<bound method DataFrame.drop_duplicates of       GOAL_ID                GOAL_DESC_EN                 GOAL_DESC_AR  \
0           1                  No poverty             القضاء على الفقر   
1           1                  No poverty             القضاء على الفقر   
2           1                  No poverty             القضاء على الفقر   
3           1                  No poverty             القضاء على الفقر   
4           1                  No poverty             القضاء على الفقر   
5           1                  No poverty             القضاء على الفقر   
6           1                  No poverty             القضاء على الفقر   
7           1                  No poverty             القضاء على الفقر   
8           1                  No poverty             القضاء على الفقر   
9           1                  No poverty             القضاء على الفقر   
10          1                  No poverty             القضاء على الفقر   
11          1                  No poverty             القضاء على الفق