# 3-assessment-factors
This notebook prepares the dataset for the assessment factors analysis: showing volume of assessments with certain factors, across London.

Input: main flatfile containing the CIN information from all LAs

Output: table with
- All the Assessment authorised information
- One column per factor, with 1 = factor identified at assessment, 0 = factor not identified

In [None]:
import os
import pandas as pd

%run "00-config.ipynb"
%load_ext autoreload
%autoreload 2

### Define filepaths

In [None]:
input_file = os.path.join(flatfile_folder, 'main_flatcin.csv')
output_file = os.path.join(output_folder, 'assessments.csv')

### Data wrangling

In [None]:
# Load flatfile
df = pd.read_csv(input_file)

# Only keep assessment authorised
df = df[df.Type == 'AssessmentAuthorisationDate']

# Remove empty columns to get smaller dataset
df.dropna(axis=1, how='all', inplace=True)

In [None]:
# Split unique factors column into a column with each factor, with either 0 or 1
factor_cols = df.Factors.str.split(',', expand=True).stack().str.get_dummies().sum(level=0)

# Check all values are either 0 or 1 - needs to return True
print(factor_cols.isin([0,1]).all().all())

factor_cols.head()

In [None]:
# Merge back to main df

df = pd.concat([df, factor_cols], axis=1)

In [None]:
# Save
df.to_csv(output_file, index=False)