## dataframecleaning(table:dict,subset=duplicates_subset)
Transform a dictionary into a dataframe before perform quality checks on it and returning the flagged and the cleaned dataframes.

parameter|description
---|---
table|the dictionary of lists
subset|the subset of values to be checked for duplicates

In [37]:
import pandas

def dataframecleaning(table:dict,subset:list):
    dataframe = pandas.DataFrame(table)
    duplicates_check = dataframe.duplicated(subset=subset)
    dataframe["none_count"] = dataframe.isnull().sum(axis=1)
    dataframe["duplicate"] = duplicates_check
    dataframe_cleaned = dataframe[dataframe.duplicate==False].drop(['duplicate','none_count'], axis=1)
    dataframe_cleaned.dropna(how='all')
    return dataframe, dataframe_cleaned

## makecsv (table:dict)
Function to transform a dictionary of list into a dataframe to perform a data cleaning before creating the flagged and cleaned dataframe.
"os" module used to retrieve the relative path.

In [38]:
import os

def makecsv(table:dict, subset:list, path=os.path.abspath('')):
    flagged, cleaned = dataframecleaning(table,subset)
    filepath = os.path.join(path, "flagged.csv")
    flagged.to_csv(filepath)
    filepath = os.path.join(path, "cleaned.csv")
    cleaned.to_csv(filepath)
    print("CSVs created at: " + str(path))
    return flagged, cleaned

### Example
From the dictionaries main and features, create the dataframe sample before cleaning it and write the flagged and unflagged csvs into the working folder.

In [36]:
main =  {"link":["http","htt","http2","htt2"],
          "type_of_property":["house","apartment","apartment",None]}
features = {"locality":["Antwerp","Bruxelles","Bruxelles",None],
            "garden":["yes","no","no",None]}
main.update(features)
subset = ["type_of_property","locality","garden"]
df, df_cleaned = makecsv(table=main,subset=subset)

print("\n\n",df,"\n\n",df_cleaned)


CSVs created at: C:\Users\Fra\OneDrive\Desktop\belearner\challenge-collecting-data
    link type_of_property   locality garden  none_count  duplicate
0   http            house    Antwerp    yes           0      False
1    htt        apartment  Bruxelles     no           0      False
2  http2        apartment  Bruxelles     no           0       True
3   htt2             None       None   None           3      False 

    link type_of_property   locality garden
0  http            house    Antwerp    yes
1   htt        apartment  Bruxelles     no
3  htt2             None       None   None
