Lambda School Data Science

*Unit 2, Sprint 2, Module 2*

---

# Random Forests

## Assignment
- [ ] Read [“Adopting a Hypothesis-Driven Workflow”](https://outline.com/5S5tsB), a blog post by a Lambda DS student about the Tanzania Waterpumps challenge.
- [x] Continue to participate in our Kaggle challenge.
- [x] Define a function to wrangle train, validate, and test sets in the same way. Clean outliers and engineer features.
- [x] Try Ordinal Encoding.
- [x] Try a Random Forest Classifier.
- [x] Submit your predictions to our Kaggle competition. (Go to our Kaggle InClass competition webpage. Use the blue **Submit Predictions** button to upload your CSV file. Or you can use the Kaggle API to submit your predictions.)
- [x] Commit your notebook to your fork of the GitHub repo.

## Stretch Goals

### Doing
- [ ] Add your own stretch goal(s) !
- [ ] Do more exploratory data analysis, data cleaning, feature engineering, and feature selection.
- [ ] Try other [categorical encodings](https://contrib.scikit-learn.org/categorical-encoding/).
- [ ] Get and plot your feature importances.
- [ ] Make visualizations and share on Slack.

### Reading

Top recommendations in _**bold italic:**_

#### Decision Trees
- A Visual Introduction to Machine Learning, [Part 1: A Decision Tree](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/),  and _**[Part 2: Bias and Variance](http://www.r2d3.us/visual-intro-to-machine-learning-part-2/)**_
- [Decision Trees: Advantages & Disadvantages](https://christophm.github.io/interpretable-ml-book/tree.html#advantages-2)
- [How a Russian mathematician constructed a decision tree — by hand — to solve a medical problem](http://fastml.com/how-a-russian-mathematician-constructed-a-decision-tree-by-hand-to-solve-a-medical-problem/)
- [How decision trees work](https://brohrer.github.io/how_decision_trees_work.html)
- [Let’s Write a Decision Tree Classifier from Scratch](https://www.youtube.com/watch?v=LDRbO9a6XPU)

#### Random Forests
- [_An Introduction to Statistical Learning_](http://www-bcf.usc.edu/~gareth/ISL/), Chapter 8: Tree-Based Methods
- [Coloring with Random Forests](http://structuringtheunstructured.blogspot.com/2017/11/coloring-with-random-forests.html)
- _**[Random Forests for Complete Beginners: The definitive guide to Random Forests and Decision Trees](https://victorzhou.com/blog/intro-to-random-forests/)**_

#### Categorical encoding for trees
- [Are categorical variables getting lost in your random forests?](https://roamanalytics.com/2016/10/28/are-categorical-variables-getting-lost-in-your-random-forests/)
- [Beyond One-Hot: An Exploration of Categorical Variables](http://www.willmcginnis.com/2015/11/29/beyond-one-hot-an-exploration-of-categorical-variables/)
- _**[Categorical Features and Encoding in Decision Trees](https://medium.com/data-design/visiting-categorical-features-and-encoding-in-decision-trees-53400fa65931)**_
- _**[Coursera — How to Win a Data Science Competition: Learn from Top Kagglers — Concept of mean encoding](https://www.coursera.org/lecture/competitive-data-science/concept-of-mean-encoding-b5Gxv)**_
- [Mean (likelihood) encodings: a comprehensive study](https://www.kaggle.com/vprokopev/mean-likelihood-encodings-a-comprehensive-study)
- [The Mechanics of Machine Learning, Chapter 6: Categorically Speaking](https://mlbook.explained.ai/catvars.html)

#### Imposter Syndrome
- [Effort Shock and Reward Shock (How The Karate Kid Ruined The Modern World)](http://www.tempobook.com/2014/07/09/effort-shock-and-reward-shock/)
- [How to manage impostor syndrome in data science](https://towardsdatascience.com/how-to-manage-impostor-syndrome-in-data-science-ad814809f068)
- ["I am not a real data scientist"](https://brohrer.github.io/imposter_syndrome.html)
- _**[Imposter Syndrome in Data Science](https://caitlinhudon.com/2018/01/19/imposter-syndrome-in-data-science/)**_


### More Categorical Encodings

**1.** The article **[Categorical Features and Encoding in Decision Trees](https://medium.com/data-design/visiting-categorical-features-and-encoding-in-decision-trees-53400fa65931)** mentions 4 encodings:

- **"Categorical Encoding":** This means using the raw categorical values as-is, not encoded. Scikit-learn doesn't support this, but some tree algorithm implementations do. For example, [Catboost](https://catboost.ai/), or R's [rpart](https://cran.r-project.org/web/packages/rpart/index.html) package.
- **Numeric Encoding:** Synonymous with Label Encoding, or "Ordinal" Encoding with random order. We can use [category_encoders.OrdinalEncoder](https://contrib.scikit-learn.org/categorical-encoding/ordinal.html).
- **One-Hot Encoding:** We can use [category_encoders.OneHotEncoder](http://contrib.scikit-learn.org/categorical-encoding/onehot.html).
- **Binary Encoding:** We can use [category_encoders.BinaryEncoder](http://contrib.scikit-learn.org/categorical-encoding/binary.html).


**2.** The short video 
**[Coursera — How to Win a Data Science Competition: Learn from Top Kagglers — Concept of mean encoding](https://www.coursera.org/lecture/competitive-data-science/concept-of-mean-encoding-b5Gxv)** introduces an interesting idea: use both X _and_ y to encode categoricals.

Category Encoders has multiple implementations of this general concept:

- [CatBoost Encoder](http://contrib.scikit-learn.org/categorical-encoding/catboost.html)
- [James-Stein Encoder](http://contrib.scikit-learn.org/categorical-encoding/jamesstein.html)
- [Leave One Out](http://contrib.scikit-learn.org/categorical-encoding/leaveoneout.html)
- [M-estimate](http://contrib.scikit-learn.org/categorical-encoding/mestimate.html)
- [Target Encoder](http://contrib.scikit-learn.org/categorical-encoding/targetencoder.html)
- [Weight of Evidence](http://contrib.scikit-learn.org/categorical-encoding/woe.html)

Category Encoder's mean encoding implementations work for regression problems or binary classification problems. 

For multi-class classification problems, you will need to temporarily reformulate it as binary classification. For example:

```python
encoder = ce.TargetEncoder(min_samples_leaf=..., smoothing=...) # Both parameters > 1 to avoid overfitting
X_train_encoded = encoder.fit_transform(X_train, y_train=='functional')
X_val_encoded = encoder.transform(X_train, y_val=='functional')
```

For this reason, mean encoding won't work well within pipelines for multi-class classification problems.

**3.** The **[dirty_cat](https://dirty-cat.github.io/stable/)** library has a Target Encoder implementation that works with multi-class classification.

```python
 dirty_cat.TargetEncoder(clf_type='multiclass-clf')
```
It also implements an interesting idea called ["Similarity Encoder" for dirty categories](https://www.slideshare.net/GaelVaroquaux/machine-learning-on-non-curated-data-154905090).

However, it seems like dirty_cat doesn't handle missing values or unknown categories as well as category_encoders does. And you may need to use it with one column at a time, instead of with your whole dataframe.

**4. [Embeddings](https://www.kaggle.com/learn/embeddings)** can work well with sparse / high cardinality categoricals.

_**I hope it’s not too frustrating or confusing that there’s not one “canonical” way to encode categoricals. It’s an active area of research and experimentation! Maybe you can make your own contributions!**_

### Setup

You can work locally (follow the [local setup instructions](https://lambdaschool.github.io/ds/unit2/local/)) or on Colab (run the code cell below).

In [2]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Kaggle-Challenge/master/data/'
    !pip install category_encoders==2.*

# If you're working locally:
else:
    DATA_PATH = '../data/'

In [175]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

pd.set_option('display.max_columns', 999)

train = pd.merge(pd.read_csv(DATA_PATH+'waterpumps/train_features.csv'), 
                 pd.read_csv(DATA_PATH+'waterpumps/train_labels.csv'))
test = pd.read_csv(DATA_PATH+'waterpumps/test_features.csv')
sample_submission = pd.read_csv(DATA_PATH+'waterpumps/sample_submission.csv')

train.shape, test.shape

((59400, 41), (14358, 40))

In [176]:
train, val = train_test_split(train, random_state=17)

In [177]:
train

Unnamed: 0,id,amount_tsh,date_recorded,funder,gps_height,installer,longitude,latitude,wpt_name,num_private,basin,subvillage,region,region_code,district_code,lga,ward,population,public_meeting,recorded_by,scheme_management,scheme_name,permit,construction_year,extraction_type,extraction_type_group,extraction_type_class,management,management_group,payment,payment_type,water_quality,quality_group,quantity,quantity_group,source,source_type,source_class,waterpoint_type,waterpoint_type_group,status_group
14158,34470,0.0,2011-07-29,Hesawa,0,DWE,33.037573,-2.503828,Kwa Mbisu,0,Lake Victoria,Busekwa,Mwanza,19,2,Magu,Bujashi,0,True,GeoData Consultants Ltd,VWC,,True,0,other,other,other,vwc,user-group,never pay,never pay,soft,good,insufficient,insufficient,shallow well,shallow well,groundwater,other,other,non functional
9080,55171,200.0,2013-01-20,Rvemp,1155,DWE,33.378965,-2.154466,Kwachisaku Mwndu,0,Lake Victoria,Legeza,Mara,20,4,Bunda,Nansimo,300,True,GeoData Consultants Ltd,WUG,,False,2003,other,other,other,wug,user-group,pay monthly,monthly,soft,good,enough,enough,shallow well,shallow well,groundwater,hand pump,hand pump,non functional
45469,72157,0.0,2011-04-09,Government Of Tanzania,0,Central Government,33.429493,-9.026036,Church Of God,0,Lake Rukwa,Ihova,Mbeya,12,2,Mbeya Rural,Iyunga mapinduzi,0,False,GeoData Consultants Ltd,Parastatal,,False,0,gravity,gravity,gravity,parastatal,parastatal,never pay,never pay,soft,good,seasonal,seasonal,rainwater harvesting,rainwater harvesting,surface,communal standpipe,communal standpipe,functional
41992,53472,0.0,2011-03-12,Amref,13,AMREF,39.213282,-7.211929,Kwa Mbwela,0,Wami / Ruvu,Ngarambe Kuu,Pwani,6,4,Mkuranga,Mbezi,150,True,GeoData Consultants Ltd,VWC,,False,2003,swn 80,swn 80,handpump,vwc,user-group,never pay,never pay,soft,good,enough,enough,shallow well,shallow well,groundwater,hand pump,hand pump,functional
21571,14717,0.0,2011-03-02,,-37,,39.655339,-7.916720,Kwa Kasimu,0,Rufiji,Msufini,Pwani,60,60,Mafia,Kilindoni,27,True,GeoData Consultants Ltd,VWC,,False,0,submersible,submersible,submersible,vwc,user-group,never pay,never pay,soft,good,enough,enough,machine dbh,borehole,groundwater,communal standpipe,communal standpipe,functional
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42297,19959,0.0,2012-10-10,Dwsp,0,DWE,33.469236,-3.607610,Shuleni,0,Internal,Uzogore,Shinyanga,17,7,Shinyanga Urban,Ibadakuli,0,False,GeoData Consultants Ltd,Parastatal,,False,0,other - rope pump,rope pump,rope pump,parastatal,parastatal,unknown,unknown,soft,good,seasonal,seasonal,rainwater harvesting,rainwater harvesting,surface,communal standpipe,communal standpipe,functional needs repair
33174,31378,0.0,2013-01-31,Finw,279,FinW,39.587734,-10.682716,Shuleni,0,Ruvuma / Southern Coast,Nachunyu Bondeni,Mtwara,9,4,Tandahimba,Lyenje,600,True,GeoData Consultants Ltd,Water Board,Borehole,True,1982,submersible,submersible,submersible,vwc,user-group,never pay,never pay,soft,good,dry,dry,machine dbh,borehole,groundwater,communal standpipe multiple,communal standpipe,non functional
46470,73223,0.0,2011-03-04,Rc Ch,1736,RC Ch,34.875985,-9.591650,none,0,Lake Nyasa,Mawulo,Iringa,11,4,Njombe,Iwungilo,20,False,GeoData Consultants Ltd,,Uganda,False,1995,gravity,gravity,gravity,unknown,unknown,never pay,never pay,soft,good,enough,enough,spring,spring,groundwater,communal standpipe,communal standpipe,functional
34959,49904,0.0,2013-03-20,World Vision,1322,Community,36.797282,-3.355298,Nikodemo Risa,0,Pangani,Kyaraa,Arusha,2,7,Meru,Singisi,146,True,GeoData Consultants Ltd,VWC,Seela Sing'isi gravity water supply,True,1998,gravity,gravity,gravity,vwc,user-group,never pay,never pay,soft,good,enough,enough,spring,spring,groundwater,communal standpipe,communal standpipe,non functional


In [178]:
def wrangle(data):
    df = data.copy()
    
    weird_mismatches = {
        5: ('Tanga', 4),
        11: ('Shinyanga', 17),
        14: ('Shinyanga', 17),
        17: ('Mwanza', 19),
        18: ('Lindi', 8),
        24: ('Arusha', 2),
        40: ('Pwani', 6),
        60: ('Pwani', 6),
        80: ('Lindi', 8),
        90: ('Mtwara', 9),
        99: ('Mtwara', 9)
    }
    for code in weird_mismatches:
        wrong_region, right_code = weird_mismatches[code]
        df.loc[(df.region_code == code) & (df.region == wrong_region), 'region_code'] = right_code
    
    # About 3% of the time, latitude has small values near zero,
    # outside Tanzania, so we'll treat these values like zero.
    df['latitude'] = df['latitude'].replace(-2e-08, 0)
    
    # When columns have zeros and shouldn't, they are like null values.
    # So we will replace the zeros with nulls, and impute missing values later.
    # Also create a "missing indicator" column, because the fact that
    # values are missing may be a predictive signal.
    cols_with_zeros = ['longitude', 'latitude', 'construction_year', 
                       'gps_height', 'population', 'amount_tsh']
    for col in cols_with_zeros:
        df[col] = df[col].replace(0, np.nan)
    
    # Convert date_recorded to datetime
    df['date_recorded'] = pd.to_datetime(df['date_recorded'], infer_datetime_format=True)
    
    # Extract components from date_recorded, then drop the original column
    df['year_recorded'] = df['date_recorded'].dt.year
    df['month_recorded'] = df['date_recorded'].dt.month
    df['day_recorded'] = df['date_recorded'].dt.day
    df = df.drop(columns='date_recorded')
    
    # Engineer feature: how many years from construction_year to date_recorded
    df['years'] = df['year_recorded'] - df['construction_year']
    
    # return the wrangled dataframe
    
    evil_dimensions = ['extraction_type_group', 'management', 'water_quality',
                       'payment', 'extraction_type', 'waterpoint_type_group',
                       'scheme_management', 'quantity_group', 'source',
                       'source_class', 'recorded_by', 'region']
        
    df = df.drop(evil_dimensions, axis=1)
    return df

train = wrangle(train)
val = wrangle(val)
test = wrangle(test)

In [179]:
train

Unnamed: 0,id,amount_tsh,funder,gps_height,installer,longitude,latitude,wpt_name,num_private,basin,subvillage,region_code,district_code,lga,ward,population,public_meeting,scheme_name,permit,construction_year,extraction_type_class,management_group,payment_type,quality_group,quantity,source_type,waterpoint_type,status_group,year_recorded,month_recorded,day_recorded,years
14158,34470,,Hesawa,,DWE,33.037573,-2.503828,Kwa Mbisu,0,Lake Victoria,Busekwa,19,2,Magu,Bujashi,,True,,True,,other,user-group,never pay,good,insufficient,shallow well,other,non functional,2011,7,29,
9080,55171,200.0,Rvemp,1155.0,DWE,33.378965,-2.154466,Kwachisaku Mwndu,0,Lake Victoria,Legeza,20,4,Bunda,Nansimo,300.0,True,,False,2003.0,other,user-group,monthly,good,enough,shallow well,hand pump,non functional,2013,1,20,10.0
45469,72157,,Government Of Tanzania,,Central Government,33.429493,-9.026036,Church Of God,0,Lake Rukwa,Ihova,12,2,Mbeya Rural,Iyunga mapinduzi,,False,,False,,gravity,parastatal,never pay,good,seasonal,rainwater harvesting,communal standpipe,functional,2011,4,9,
41992,53472,,Amref,13.0,AMREF,39.213282,-7.211929,Kwa Mbwela,0,Wami / Ruvu,Ngarambe Kuu,6,4,Mkuranga,Mbezi,150.0,True,,False,2003.0,handpump,user-group,never pay,good,enough,shallow well,hand pump,functional,2011,3,12,8.0
21571,14717,,,-37.0,,39.655339,-7.916720,Kwa Kasimu,0,Rufiji,Msufini,6,60,Mafia,Kilindoni,27.0,True,,False,,submersible,user-group,never pay,good,enough,borehole,communal standpipe,functional,2011,3,2,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42297,19959,,Dwsp,,DWE,33.469236,-3.607610,Shuleni,0,Internal,Uzogore,17,7,Shinyanga Urban,Ibadakuli,,False,,False,,rope pump,parastatal,unknown,good,seasonal,rainwater harvesting,communal standpipe,functional needs repair,2012,10,10,
33174,31378,,Finw,279.0,FinW,39.587734,-10.682716,Shuleni,0,Ruvuma / Southern Coast,Nachunyu Bondeni,9,4,Tandahimba,Lyenje,600.0,True,Borehole,True,1982.0,submersible,user-group,never pay,good,dry,borehole,communal standpipe multiple,non functional,2013,1,31,31.0
46470,73223,,Rc Ch,1736.0,RC Ch,34.875985,-9.591650,none,0,Lake Nyasa,Mawulo,11,4,Njombe,Iwungilo,20.0,False,Uganda,False,1995.0,gravity,unknown,never pay,good,enough,spring,communal standpipe,functional,2011,3,4,16.0
34959,49904,,World Vision,1322.0,Community,36.797282,-3.355298,Nikodemo Risa,0,Pangani,Kyaraa,2,7,Meru,Singisi,146.0,True,Seela Sing'isi gravity water supply,True,1998.0,gravity,user-group,never pay,good,enough,spring,communal standpipe,non functional,2013,3,20,15.0


In [180]:
# The status_group column is the target
target = 'status_group'

# Get a dataframe with all train columns except the target
# I need to keep the id column from the test set apparently
train_features = train.drop(columns=[target, 'id'])

# Get a list of the numeric features
numeric_features = train_features.select_dtypes(include='number').columns.tolist()

# Get a series with the cardinality of the nonnumeric features
cardinality = train_features.select_dtypes(exclude='number').nunique()

# Get a list of all categorical features with cardinality <= 50
categorical_features = cardinality[cardinality <= 50].index.tolist()

# Combine the lists 
features = numeric_features + categorical_features

In [181]:
# Arrange data into X features matrix and y target vector 
X_train = train[features]
y_train = train[target]
X_val = val[features]
y_val = val[target]
X_test = test[features]

In [190]:
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.preprocessing import StandardScaler, OrdinalEncoder

In [198]:
%%time

# Well, I "tried" OrdinalEncoder, but it won't work
pipeline = make_pipeline(
    SimpleImputer(),
    OrdinalEncoder(),
    IterativeImputer(random_state=0, imputation_order='descending'),
    StandardScaler(),
    RandomForestClassifier(random_state=0,
                           n_jobs=-1,
                           n_estimators=100)
)
pipeline.fit(X_train, y_train)
print('Validation Accuracy', pipeline.score(X_val, y_val))

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

In [138]:
final = pd.DataFrame(test['id'])
predictions = pipeline.predict(X_test)
final['status_group'] = predictions

In [144]:
final.to_csv('../submission.csv', index=False)

In [146]:
sample_submission.status_group.value_counts()

functional    14358
Name: status_group, dtype: int64

In [158]:
pipeline.named_steps['onehotencoder'].transform(X_train)

Unnamed: 0,amount_tsh,gps_height,longitude,latitude,num_private,region_code,district_code,population,construction_year,year_recorded,month_recorded,day_recorded,years,basin_Lake Victoria,basin_Lake Rukwa,basin_Wami / Ruvu,basin_Rufiji,basin_Pangani,basin_Lake Nyasa,basin_Internal,basin_Ruvuma / Southern Coast,basin_Lake Tanganyika,region_Mwanza,region_Mara,region_Mbeya,region_Pwani,region_Tanga,region_Iringa,region_Arusha,region_Shinyanga,region_Morogoro,region_Kagera,region_Ruvuma,region_Singida,region_Kigoma,region_Tabora,region_Dodoma,region_Dar es Salaam,region_Kilimanjaro,region_Manyara,region_Rukwa,region_Mtwara,region_Lindi,public_meeting_True,public_meeting_False,public_meeting_nan,permit_True,permit_False,permit_nan,extraction_type_class_other,extraction_type_class_gravity,extraction_type_class_handpump,extraction_type_class_submersible,extraction_type_class_motorpump,extraction_type_class_rope pump,extraction_type_class_wind-powered,management_group_user-group,management_group_parastatal,management_group_other,management_group_commercial,management_group_unknown,payment_type_never pay,payment_type_monthly,payment_type_per bucket,payment_type_unknown,payment_type_annually,payment_type_on failure,payment_type_other,quality_group_good,quality_group_milky,quality_group_unknown,quality_group_colored,quality_group_salty,quality_group_fluoride,quantity_insufficient,quantity_enough,quantity_seasonal,quantity_unknown,quantity_dry,source_type_shallow well,source_type_rainwater harvesting,source_type_borehole,source_type_river/lake,source_type_spring,source_type_other,source_type_dam,waterpoint_type_other,waterpoint_type_hand pump,waterpoint_type_communal standpipe,waterpoint_type_communal standpipe multiple,waterpoint_type_improved spring,waterpoint_type_cattle trough,waterpoint_type_dam,longitude_MISSING,latitude_MISSING,construction_year_MISSING,gps_height_MISSING,population_MISSING,amount_tsh_MISSING,years_MISSING
14158,,,33.037573,-2.503828,0,19,2,,,2011,7,29,,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,False,False,True,True,True,True,True
9080,200.0,1155.0,33.378965,-2.154466,0,20,4,300.0,2003.0,2013,1,20,10.0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,False,False,False,False,False,False,False
45469,,,33.429493,-9.026036,0,12,2,,,2011,4,9,,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,False,False,True,True,True,True,True
41992,,13.0,39.213282,-7.211929,0,6,4,150.0,2003.0,2011,3,12,8.0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,False,False,False,False,False,True,False
21571,,-37.0,39.655339,-7.916720,0,6,60,27.0,,2011,3,2,,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,False,False,True,False,False,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42297,,,33.469236,-3.607610,0,17,7,,,2012,10,10,,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,False,False,True,True,True,True,True
33174,,279.0,39.587734,-10.682716,0,9,4,600.0,1982.0,2013,1,31,31.0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,False,False,False,False,False,True,False
46470,,1736.0,34.875985,-9.591650,0,11,4,20.0,1995.0,2011,3,4,16.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,False,False,False,False,False,True,False
34959,,1322.0,36.797282,-3.355298,0,2,7,146.0,1998.0,2013,3,20,15.0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,False,False,False,False,False,True,False
