# Predicting the house price

This project aims to build a reliable predictor for house prices, based on the kaggle competition:

https://www.kaggle.com/c/predict-the-housing-price

In this notebook the data is imported and preprocessed. The neighbourhood information is used to calculate the distance from the town centre, the data is cleaned and non numerical variables are transformed either through encoding or the creation of dummy variables.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#from pandas_profiling import ProfileReport
from geopy.geocoders import Nominatim

%matplotlib inline

In [2]:
php_data= pd.read_csv('data/train.csv')

In [3]:
php_data.shape

(1021, 81)

In [22]:
php_data.dtypes.unique()

array([dtype('int64'), dtype('O'), dtype('float64')], dtype=object)

In [13]:
php_data_copy = php_data.copy()
php_data_copy =php_data_copy.drop(['Id','SalePrice'],axis=1)
php_data_copy.select_dtypes(include=['float', 'int','float64', 'int64']).shape

(1021, 36)

In [23]:
php_data_copy = php_data.copy()
php_data_copy =php_data_copy.drop(['Id','SalePrice'],axis=1)
php_data_copy.select_dtypes(include=['O']).shape

(1021, 43)

Of the 79 explanatory variables, 36 are numerical (both categorical and qualtitative) and 43 are non-numerical.

## Data preparation

The data provided is numerical and categorical, and some fields probably need dummy variables. A complete explanation of the data fields is available on the dataset page on Kaggle, and I am providing a short description on  how I plan to make it suited for modeling. 

Note that I am making dummy variables manually since test and train data are separated and I have to assume that they are to be kept separated, hence dummy variables might be different if generated automatically, since not all values for each variable might be represented in both train and test datasets.

The _neighborhood_ field gives geographical information regarding the house, in the form of it belonging to a particular neighbourhood in Ames. My strategy to make use of this information is to use convert it into distance between teh neighbourhood and the town centre. Note that in doing this I put particular value in the town centre as a point of interest, and for other/bigger cities multiple points of interest might be considered (stadiums, museums, parks, etc.)

After data preparation, feature engineering and dimensionality reduction are explained in the feature engineering notebook.


# NOTE - Go to checkpoint if already possessing the file with the geographical information
### Getting neighbourhoods coordinates, and distance from the centre

In the following cells I use the geopy package to collect the coordinates of Ames neighbourhoods. Unfortunately, the _neighborhood_ field is completed with abbreviations of the name, and it is unusable as it is.

To make things easier I stored the values of _neighborhood_ in a csv file and for each one I copied the full name of the neighbourhood, or a name that gets recognised by the geolocator function for that neighbourhood. This is obviously an _ad hoc_ solution that works for this town and would have to be done anew for another town. The alternative is to drop the variable.

In [154]:
Neighborhood = pd.read_csv('data/Ames_neighborhoods.csv',names=['Code','Full_name'])

geolocator = Nominatim()
Ames_coordinates={}
for loc in Neighborhood.Full_name:
    if loc not in Ames_coordinates.keys():
        try:
            location = geolocator.geocode(loc +" ,Ames,Iowa")
            print(loc)
            print(location.address)
        except AttributeError: 
            location = geolocator.geocode(loc +" ,Iowa")
            print(loc)
            print(location.address)
        #print((location.latitude, location.longitude))
        except timeout:
            break
        except NameError:
            break
        Ames_coordinates[loc]=[location.latitude, location.longitude]

Cycle multiple times if time out errors arise. At the end of it, I have built a dictionary with the neighbourhoods coordinates.

In [175]:
for i in range(len(Neighborhood.Full_name)):
    Neighborhood.loc[i,'Latitude'] = Ames_coordinates[Neighborhood.loc[i,'Full_name']][0]
    Neighborhood.loc[i,'Longitude'] = Ames_coordinates[Neighborhood.loc[i,'Full_name']][1]
Neighborhood.head()

Unnamed: 0,Code,Full_name,Latitude,Longitude
0,Blmngtn,Bloomington,42.055627,-93.619566
1,Blueste,Bluestem,42.01117,-93.645063
2,BrDale,Truman Place,42.052547,-93.628226
3,BrkSide,Brookside,42.02677,-93.617055
4,ClearCr,Clear Creek,41.994591,-93.261668


In [163]:
Neighborhood.iloc[0].Latitude

42.0379557

In [176]:
Ames_centre=[42.026309, -93.617379]

In the next cell there's a function that calculated linear distance between saptial coordinates.

Reference: https://colab.research.google.com/drive/1hHIBZIms7iotpWSZDMdO_DMMBVfdzrXC

In [177]:

from math import radians, cos, sin, asin, sqrt

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    km = 6367 * c
    return km

In [178]:
for i in range(len(Neighborhood.Full_name)):
    dist = haversine(Ames_centre[1], Ames_centre[0], Neighborhood.iloc[i].Longitude, Neighborhood.iloc[i].Latitude)
    Neighborhood.loc[i,'Distance_from_centre'] = dist

### Save Neighbourhoods information

In [194]:
Neighborhood.to_csv('data/Neighborhood.coordinates.csv')
Neighborhood.head()

Unnamed: 0,Code,Full_name,Latitude,Longitude,Distance_from_centre
0,Blmngtn,Bloomington,42.055627,-93.619566,3.262962
1,Blueste,Bluestem,42.01117,-93.645063,2.837899
2,BrDale,Truman Place,42.052547,-93.628226,3.050035
3,BrkSide,Brookside,42.02677,-93.617055,0.057804
4,ClearCr,Clear Creek,41.994591,-93.261668,29.581234


# CHECKPOINT
### Load Neighbourhoods information

In [5]:
Neighborhood = pd.read_csv('data/Neighborhood.coordinates.csv')
Neighborhood=Neighborhood.drop('Unnamed: 0',axis=1)
Neighborhood.head()

Unnamed: 0,Code,Full_name,Latitude,Longitude,Distance_from_centre
0,Blmngtn,Bloomington,42.055627,-93.619566,3.262962
1,Blueste,Bluestem,42.01117,-93.645063,2.837899
2,BrDale,Truman Place,42.052547,-93.628226,3.050035
3,BrkSide,Brookside,42.02677,-93.617055,0.057804
4,ClearCr,Clear Creek,41.994591,-93.261668,29.581234


### Data cleaning and dummy variables
Dummy variables are translated into quantitative features, trying to maintain some meaning into the encoding. See the data exploration notebook to see how some variables scale with target.
Notes:
- Condition1 and Condition2 elements are either considered negative (-1 value), normal i.e. neutral (0) or positive (1), according to the description provided;
- similar considerations regarding building type;
- As regards the year of construction and remodeling, I opted to convert them into the age of the house, or of the remodeling; 
- Roofstyle: tentative numbering proportional to complexity;
- RoofMatl: similar, plus costing information is taken into consideration;
- MoSold: the month of sale information is removed, as secondary to the year of sale;
- SaleType: from data exploration, it doesn't look like this field has a large impact, except for the value "New", which has a higher average. I will then create a variable "New" with values 0,1;
- Condition of sale: this filed lists a series of circumstances that can have a big influence on the sale price. I am creating dummy variables in order to retain all the information.

In [6]:
MSZoning = {'A':1,'C':2,'C (all)':2,'FV':3,'I':4,'RH':5,'RL':6,'RP':7,'RM':8}
Street = {'Grvl':1,'Pave':2,'unknown':0}
LotShape= {'Reg':0,'IR1':1,'IR2':2,'IR3':3}
LandContour={'Lvl':0,'Bnk':1,'HLS':2,'Low':3}
Utilities = {'AllPub':0,'NoSewr':1,'NoSeWa':2,'ELO':3}
LotConfig= {'Inside':0,'Corner':1,'CulDSac':2,'FR2':3,'FR3':4}
LandSlope ={'Gtl':0,'Mod':1,'Sev':2}
Condition = {'PosN':1,'PosA':1,'Norm':0,'Artery':-1,'Feedr':-1,'RRNn':-1,'RRAn':-1,'RRNe':-1,'RRAe':-1}
BldgType ={'Twnhs':0,'TwnhsI':0,'TwnhsE':1,'Duplex':2,'2fmCon':3,'1Fam':4}

HouseStyle ={'SLvl':4,'1Story':6,'1.5Fin':3,'1.5Unf':1.5,'SFoyer':2,'2Story':7,'2.5Fin':8,'2.5Unf':7.5}
RoofStyle={'Flat':4,'Shed':0,'Gable':2,'Hip':5,'Gambrel':1,'Mansard':3}
RoofMatl = {'Membran':5,'Tar&Grv':1,'WdShake':4,'WdShngl':6,'Metal':2,'ClyTile':0,'CompShg':3,'Roll':-1}
Exterior1st={'CBlock':0,'Other':0,'PreCast':0,'Stone':0,'BrkComm':1,'Brk Cmn':1,'AsphShn':2,'AsbShng':3,'ImStucc':4,'Stucco':4,'Wd Sdng':5,'MetalSd':6,'HdBoard':7,'Wd Shng':8,'WdShing':8,'Plywood':9,'BrkFace':10,'VinylSd':11,'CemntBd':12,'CmentBd':12}
MasVnrType={'BrkCmn':1,'BrkFace':3,'CBlock':0,'None':2,'Stone':4}
ExterQual={'Fa':2,'Gd':4,'Po':1,'TA':3,'Ex':5}
Foundation={'BrkTil':2,'CBlock':4,'Slab':1,'Wood':3,'Stone':5,'PConc':6}
BsmtQC={'Po':1,'TA':3,'NA':0,'Fa':2,'Gd':4,'Ex':5}
BsmtExposure={'No':1,'Av':3,'NA':0,'Mn':2,'Gd':4}
BsmtFinType1 ={'NA':0,'Unf':1,'LwQ':2,'Rec':3,'BLQ':4,'ALQ':5,'GLQ':6}
Heating ={'Floor':0,'Grav':1,'Wall':2,'GasW':3,'GasA':4,'OthW':0}
HeatingQC ={'Po':1,'Fa':2,'TA':3,'Gd':4,'Ex':5}
CentralAir={'Y':1,'N':0}
Electrical={'Mix':1,'FuseP':2,'FuseF':3,'FuseA':4,'SBrkr':5,'NA':0}
KitchenQual={'Po':1,'Fa':2,'TA':3,'Gd':4,'Ex':5}
Functional= {'Typ':0,'Min1':1,'Min2':2,'Mod':3,'Maj1':4,'Maj2':5,'Sev':6,'Sal':7}
FireplaceQu= {'NA':0,'Po':1,'Fa':2,'TA':3,'Gd':4,'Ex':5}
GarageType = {'NA':0,'CarPort':1,'Detchd':2,'2Types':3,'Basment':4,'Attchd':5,'BuiltIn':6}
GarageFinish = {'NA':0,'Unf':1,'RFn':2,'Fin':3}
GarageQual= {'NA':0,'Po':1,'Fa':2,'TA':3,'Gd':4,'Ex':5}
PavedDrive={'N':0,'P':1,'Y':2}
PoolQC= {'NA':0,'Fa':1,'TA':2,'Gd':3,'Ex':4}
Fence= {'NA':0,'MnWw':1,'GdWo':2,'MnPrv':3,'GdPrv':4}

In [None]:
data=php_data.copy()
data=data.drop('Id',axis=1)

data['MSZoning']=data['MSZoning'].apply(lambda x: MSZoning[x])
data.LotFrontage=data.LotFrontage.fillna(0)
data['Street']=data['Street'].apply(lambda x: Street[x])
data.Alley=data.Alley.fillna('unknown')
data['Alley']=data['Alley'].apply(lambda x: Street[x])
data['LotShape']=data['LotShape'].apply(lambda x: LotShape[x])
data['LandContour']=data['LandContour'].apply(lambda x: LandContour[x])
data['Utilities']=data['Utilities'].apply(lambda x: Utilities[x])
data['LotConfig']=data['LotConfig'].apply(lambda x: LotConfig[x])
data['LandSlope']=data['LandSlope'].apply(lambda x: LandSlope[x])
for ii in range(len(data['Neighborhood'])):
    data.loc[ii,'Distance_from_centre']=float(Neighborhood[Neighborhood.Code==data.loc[ii]['Neighborhood']]['Distance_from_centre'])
data['Condition1']=data['Condition1'].apply(lambda x: Condition[x])
data['Condition2']=data['Condition2'].apply(lambda x: Condition[x])
data['BldgType']=data['BldgType'].apply(lambda x: BldgType[x])
data['HouseStyle']=data['HouseStyle'].apply(lambda x: HouseStyle[x])
data['YearBuilt']=data['YearBuilt'].apply(lambda x: 2020-x)
data['YearRemodAdd']=data['YearRemodAdd'].apply(lambda x: 2020-x)
data['RoofStyle']=data['RoofStyle'].apply(lambda x: RoofStyle[x])
data['RoofMatl']=data['RoofMatl'].apply(lambda x: RoofMatl[x])
data['Exterior1st']=data['Exterior1st'].apply(lambda x: Exterior1st[x])
data['Exterior2nd']=data['Exterior2nd'].apply(lambda x: Exterior1st[x])
data.MasVnrType=data.MasVnrType.fillna('None')
data['MasVnrType']=data['MasVnrType'].apply(lambda x: MasVnrType[x])
data['ExterQual']=data['ExterQual'].apply(lambda x: ExterQual[x])
data['ExterCond']=data['ExterCond'].apply(lambda x: ExterQual[x])
data['Foundation']=data['Foundation'].apply(lambda x: Foundation[x])
data.BsmtQual=data.BsmtQual.fillna('NA')
data['BsmtQual']=data['BsmtQual'].apply(lambda x: BsmtQC[x])
data.BsmtCond=data.BsmtCond.fillna('NA')
data['BsmtCond']=data['BsmtCond'].apply(lambda x: BsmtQC[x])
data.BsmtExposure=data.BsmtExposure.fillna('NA')
data['BsmtExposure']=data['BsmtExposure'].apply(lambda x: BsmtExposure[x])
data.BsmtFinType1=data.BsmtFinType1.fillna('NA')
data['BsmtFinType1']=data['BsmtFinType1'].apply(lambda x: BsmtFinType1[x])
data.BsmtFinType2=data.BsmtFinType2.fillna('NA')
data['BsmtFinType2']=data['BsmtFinType2'].apply(lambda x: BsmtFinType1[x])
data['Heating']=data['Heating'].apply(lambda x: Heating[x])
data['HeatingQC']=data['HeatingQC'].apply(lambda x: HeatingQC[x])
data['CentralAir']=data['CentralAir'].apply(lambda x: CentralAir[x])
data['Electrical']=data['Electrical'].apply(lambda x: Electrical[x])
data['KitchenQual']=data['KitchenQual'].apply(lambda x: KitchenQual[x])
data['Functional']=data['Functional'].apply(lambda x: Functional[x])
data.FireplaceQu=data.FireplaceQu.fillna('NA')
data['FireplaceQu']=data['FireplaceQu'].apply(lambda x: FireplaceQu[x])
data.GarageType=data.GarageType.fillna('NA')
data['GarageType']=data['GarageType'].apply(lambda x: GarageType[x])
data['GarageYrBlt']=data['GarageYrBlt'].apply(lambda x: 2020-x)
data.GarageFinish=data.GarageFinish.fillna('NA')
data['GarageFinish']=data['GarageFinish'].apply(lambda x: GarageFinish[x])
data.GarageQual=data.GarageQual.fillna('NA')
data['GarageQual']=data['GarageQual'].apply(lambda x: GarageQual[x])
data.GarageCond=data.GarageCond.fillna('NA')
data['GarageCond']=data['GarageCond'].apply(lambda x: GarageQual[x])
data['PavedDrive']=data['PavedDrive'].apply(lambda x: PavedDrive[x])
data.PoolQC=data.PoolQC.fillna('NA')
data['PoolQC']=data['PoolQC'].apply(lambda x: PoolQC[x])
data.Fence=data.Fence.fillna('NA')
data['Fence']=data['Fence'].apply(lambda x: Fence[x])
MiscFeature = pd.get_dummies(data['MiscFeature'])
data = pd.concat([data,MiscFeature],axis=1)
data=data.drop('MiscFeature',axis=1)
data=data.drop('MoSold',axis=1)
data['YrSold']=data['YrSold'].apply(lambda x: 2020-x)
#data['New']=data['SaleType'].apply(lambda x: if x=='New': 1 else: 0)
data.loc[data['SaleType']=='New','New']=1
data.New=data.New.fillna(0)
data=data.drop('SaleType',axis=1)
SaleCondition= pd.get_dummies(data['SaleCondition'])
data = pd.concat([data,SaleCondition],axis=1)
data=data.drop('SaleCondition',axis=1)


#data.loc[0:50,'PoolQC':]

### Function that performs the same transformations

In [8]:
def p_h_p_data_handling(data):
    data['MSZoning']=data['MSZoning'].apply(lambda x: MSZoning[x])
    data.LotFrontage=data.LotFrontage.fillna(0)
    data['Street']=data['Street'].apply(lambda x: Street[x])
    data.Alley=data.Alley.fillna('unknown')
    data['Alley']=data['Alley'].apply(lambda x: Street[x])
    data['LotShape']=data['LotShape'].apply(lambda x: LotShape[x])
    data['LandContour']=data['LandContour'].apply(lambda x: LandContour[x])
    data['Utilities']=data['Utilities'].apply(lambda x: Utilities[x])
    data['LotConfig']=data['LotConfig'].apply(lambda x: LotConfig[x])
    data['LandSlope']=data['LandSlope'].apply(lambda x: LandSlope[x])
    for ii in range(len(data['Neighborhood'])):
        data.loc[ii,'Distance_from_centre']=float(Neighborhood[Neighborhood.Code==data.loc[ii]['Neighborhood']]['Distance_from_centre'])
    data['Condition1']=data['Condition1'].apply(lambda x: Condition[x])
    data['Condition2']=data['Condition2'].apply(lambda x: Condition[x])
    data['BldgType']=data['BldgType'].apply(lambda x: BldgType[x])
    data['HouseStyle']=data['HouseStyle'].apply(lambda x: HouseStyle[x])
    data['YearBuilt']=data['YearBuilt'].apply(lambda x: 2020-x)
    data['YearRemodAdd']=data['YearRemodAdd'].apply(lambda x: 2020-x)
    data['RoofStyle']=data['RoofStyle'].apply(lambda x: RoofStyle[x])
    data['RoofMatl']=data['RoofMatl'].apply(lambda x: RoofMatl[x])
    data['Exterior1st']=data['Exterior1st'].apply(lambda x: Exterior1st[x])
    data['Exterior2nd']=data['Exterior2nd'].apply(lambda x: Exterior1st[x])
    data.MasVnrType=data.MasVnrType.fillna('None')
    data['MasVnrType']=data['MasVnrType'].apply(lambda x: MasVnrType[x])
    data['ExterQual']=data['ExterQual'].apply(lambda x: ExterQual[x])
    data['ExterCond']=data['ExterCond'].apply(lambda x: ExterQual[x])
    data['Foundation']=data['Foundation'].apply(lambda x: Foundation[x])
    data.BsmtQual=data.BsmtQual.fillna('NA')
    data['BsmtQual']=data['BsmtQual'].apply(lambda x: BsmtQC[x])
    data.BsmtCond=data.BsmtCond.fillna('NA')
    data['BsmtCond']=data['BsmtCond'].apply(lambda x: BsmtQC[x])
    data.BsmtExposure=data.BsmtExposure.fillna('NA')
    data['BsmtExposure']=data['BsmtExposure'].apply(lambda x: BsmtExposure[x])
    data.BsmtFinType1=data.BsmtFinType1.fillna('NA')
    data['BsmtFinType1']=data['BsmtFinType1'].apply(lambda x: BsmtFinType1[x])
    data.BsmtFinType2=data.BsmtFinType2.fillna('NA')
    data['BsmtFinType2']=data['BsmtFinType2'].apply(lambda x: BsmtFinType1[x])
    data['Heating']=data['Heating'].apply(lambda x: Heating[x])
    data['HeatingQC']=data['HeatingQC'].apply(lambda x: HeatingQC[x])
    data['CentralAir']=data['CentralAir'].apply(lambda x: CentralAir[x])
    data.Electrical=data.Electrical.fillna('NA')
    data['Electrical']=data['Electrical'].apply(lambda x: Electrical[x])
    data['KitchenQual']=data['KitchenQual'].apply(lambda x: KitchenQual[x])
    data['Functional']=data['Functional'].apply(lambda x: Functional[x])
    data.FireplaceQu=data.FireplaceQu.fillna('NA')
    data['FireplaceQu']=data['FireplaceQu'].apply(lambda x: FireplaceQu[x])
    data.GarageType=data.GarageType.fillna('NA')
    data['GarageType']=data['GarageType'].apply(lambda x: GarageType[x])
    data['GarageYrBlt']=data['GarageYrBlt'].apply(lambda x: 2020-x)
    data.GarageFinish=data.GarageFinish.fillna('NA')
    data['GarageFinish']=data['GarageFinish'].apply(lambda x: GarageFinish[x])
    data.GarageQual=data.GarageQual.fillna('NA')
    data['GarageQual']=data['GarageQual'].apply(lambda x: GarageQual[x])
    data.GarageCond=data.GarageCond.fillna('NA')
    data['GarageCond']=data['GarageCond'].apply(lambda x: GarageQual[x])
    data['PavedDrive']=data['PavedDrive'].apply(lambda x: PavedDrive[x])
    data.PoolQC=data.PoolQC.fillna('NA')
    data['PoolQC']=data['PoolQC'].apply(lambda x: PoolQC[x])
    data.Fence=data.Fence.fillna('NA')
    data['Fence']=data['Fence'].apply(lambda x: Fence[x])
    MiscFeature = pd.get_dummies(data['MiscFeature'])
    data = pd.concat([data,MiscFeature],axis=1)
    data=data.drop('MiscFeature',axis=1)
    data=data.drop('MoSold',axis=1)
    data['YrSold']=data['YrSold'].apply(lambda x: 2020-x)
    #data['New']=data['SaleType'].apply(lambda x: if x=='New': 1 else: 0)
    data.loc[data['SaleType']=='New','New']=1
    data.New=data.New.fillna(0)
    data=data.drop('SaleType',axis=1)
    SaleCondition= pd.get_dummies(data['SaleCondition'])
    data = pd.concat([data,SaleCondition],axis=1)
    data=data.drop('SaleCondition',axis=1)
    return data

In [10]:
data=php_data.copy()
data=data.drop('Id',axis=1)

data=p_h_p_data_handling(data)

In [23]:
data.Neighborhood

0       CollgCr
1       Veenker
2       CollgCr
3       Crawfor
4       NoRidge
         ...   
1016    CollgCr
1017    StoneBr
1018    Gilbert
1019    Blmngtn
1020    Edwards
Name: Neighborhood, Length: 1021, dtype: object

### Saving feature table

In [42]:
data.to_csv('data/Features_xy.csv')

### Separating variables and target

In [39]:
y_train = data['SalePrice']
X_train = data.drop('SalePrice',axis=1)

In [40]:
X_train.head()

Unnamed: 0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,...,Gar2,Othr,Shed,New,Abnorml,AdjLand,Alloca,Family,Normal,Partial
0,60,6,65.0,8450,2,0,0,0,0,0,...,0,0,0,0.0,0,0,0,0,1,0
1,20,6,80.0,9600,2,0,0,0,0,3,...,0,0,0,0.0,0,0,0,0,1,0
2,60,6,68.0,11250,2,0,1,0,0,0,...,0,0,0,0.0,0,0,0,0,1,0
3,70,6,60.0,9550,2,0,1,0,0,1,...,0,0,0,0.0,1,0,0,0,0,0
4,60,6,84.0,14260,2,0,1,0,0,3,...,0,0,0,0.0,0,0,0,0,1,0


## Test data

In [24]:
php_test= pd.read_csv('data/Test.csv')

In [25]:
php_test.shape

(439, 80)

In [9]:
test_data=php_test.copy()
test_id = test_data['Id']
test_data=test_data.drop('Id',axis=1)

test_feat = p_h_p_data_handling(test_data)

In [21]:
test_feat.to_csv('data/Test_Features_xy.csv')

In [10]:
test_id.to_csv('data/test_id.csv')