We are working with a dataset of real estate properties. My goal is to construct a model that predicts the tax amount for a property in a specific area using numerous independent variables related to the property.



---



# **DATASET**
This code reads the 'data.csv'(Realestate) dataset from the pandas library, stores it in the variable 'data', then uses the 'head()' function to display the first few rows. The 'head()' function gives us a fast overview of the dataset's structure and contents. There is also a commented-out line that displays the option to display the dataset's last few rows using the 'tail()' function, which can be useful for determining when the dataset ends.

In [None]:
from pandas import read_csv
import pandas as pd
data=read_csv('data.csv')
data
#data.tail()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.0900,1,296,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
506,0.98765,0.0,12.50,0,0.561,6.980,89.0,2.0980,3,320,23.0,396.00,12.00,12.0
507,0.23456,0.0,12.50,0,0.561,6.980,76.0,2.6540,3,320,23.0,343.00,25.00,32.0
508,0.44433,0.0,12.50,0,0.561,6.123,98.0,2.9870,3,320,23.0,343.00,21.00,54.0
509,0.77763,0.0,12.70,0,0.561,6.222,34.0,2.5430,3,329,23.0,343.00,76.00,67.0




---



In [None]:
print(data.dtypes)

CRIM       float64
ZN         float64
INDUS      float64
CHAS         int64
NOX        float64
RM         float64
AGE        float64
DIS        float64
RAD          int64
TAX          int64
PTRATIO    float64
B          float64
LSTAT      float64
MEDV       float64
dtype: object




---

The code data['TAX'] = pd.to_numeric(data['TAX'], errors='coerce') converts the values in the 'TAX' column of the DataFrame data to numeric data type, specifically using the to_numeric function from the pandas library. The parameter errors='coerce' is used to handle errors.

In [None]:
data['TAX'] = pd.to_numeric(data['TAX'], errors='coerce')
print(data['TAX'])

0      296
1      242
2      242
3      222
4      222
      ... 
506    320
507    320
508    320
509    329
510    345
Name: TAX, Length: 511, dtype: int64


# **Modeling Median Tax**
The main reasons of selecting Tax as the target variable is that:

1.Understanding the Economic trends

2.Accessing the socioeconomic status of a particular area

3.Developing policies for homes related to tax

This code serves the purpose of calculating and printing various statistical measures for the 'TAX' column of the DataFrame data:

* Importing statistical functions to calculate different statistical measures.
* x = data['TAX']: This line selects the 'TAX' column from the DataFrame data and assigns it to the variable x.
* Calculating and priniting statistics.

The statistical measures provide insights into the distribution and characteristics of the 'TAX' values in the dataset, helping in understanding the central tendency, spread, and frequency distribution of the data.

In [None]:
from statistics import mean,median,mode,variance,stdev
x=data['TAX']
print(round(mean(x),3))
print(median(x))
print(mode(x))
print(round(variance(x),3))
print(round(stdev(x),3))


407.44
330
666
28191.596
167.904


# **Interquartile Range for Tax Data**

Here it calculates the interquartile range (IQR) for the 'TAX' column of the DataFrame data.

* import numpy as np: This imports the numpy library,which provides mathematical functions for working with arrays and matrices.
* q3 = x.quantile(0.75): This calculates the third quartile (75th percentile) of the 'TAX' column x using the quatile()function from pandas.It represents the value below which 75% of the data falls.
* similarly q1 calculates the first quartile.
* q = q3 - q1: This computes the interquartile range by subtracting the first quartile from the third quartile.
* The IQR represents the range of the middle 50% of the data, providing a measure of statistical dispersion.

In [None]:


import numpy as np
q3=x.quantile(0.75)
q1=x.quantile(0.25)
q=q3-q1
print(q)

386.5


In [None]:
#Standard Normalization

from pandas import read_csv,DataFrame
from statistics import mean,stdev
data=read_csv('data.csv')
X=data['TAX']
Z=(x-mean(X))/stdev(X)
Y=(x-min(X))/(x-max(X))


df=DataFrame([X,Y,Z])
df=df.transpose()
df.columns=['Original','Scalar','Normalised']
df

Unnamed: 0,Original,Scalar,Normalised
0,296.0,-0.262651,-0.663716
1,242.0,-0.117271,-0.985330
2,242.0,-0.117271,-0.985330
3,222.0,-0.071575,-1.104446
4,222.0,-0.071575,-1.104446
...,...,...,...
506,320.0,-0.340153,-0.520777
507,320.0,-0.340153,-0.520777
508,320.0,-0.340153,-0.520777
509,329.0,-0.371728,-0.467175


***Developing a GLM or Generalized Linear Model***

This code snippet imports various libraries and modules(sklearn.preprocessing,StandardScaler,DataFrame, read_csv,GLM, add_constant,statsmodels.api,transpose,pandas,LabelEncoder,warnings ) for data preprocessing, statistical modeling.
warnings.filterwarnings('ignore'): This line suppresses all warnings generated by the code execution. It's often used to avoid cluttering the output with non-critical warnings.
from sklearn.preprocessing import LabelEncoder: This imports the LabelEncoder class from the sklearn.preprocessing module, which is used for encoding categorical variables as integer labels.

In [None]:
from sklearn.preprocessing import StandardScaler
from pandas import DataFrame,read_csv
from statsmodels.api import GLM, add_constant
import statsmodels.api as sm
import pandas as pd
from numpy import transpose
import warnings
warnings.filterwarnings('ignore')
from sklearn.preprocessing import LabelEncoder

***Feature Selection***

In [None]:
dataset=read_csv('data.csv')
dataset.dropna(inplace=True)
#dataset.drop(columns=['ocean_proximity'],inplace=True)
dataset.shape

(506, 14)

 ***Data Preperation***

 In the code prepared the dataset for regression analysis by adding a constant term to the feature matrix X.
 Creates a new DataFrame X containing the original dataset and then adds a constant column to it. It also extracts the 'TAX' column from the dataset and assigns it to a new variable Y. Finally, it outputs the shape of the modified DataFrame X.

In [None]:
X=dataset
Y=dataset['TAX']
X=add_constant(X)
X.shape

(506, 15)

***Prediction Using the model***

data.head() Displaying the initial rows of a DataFrame is often done to quickly inspect the structure of the data, examine the column names, and get a sense of the values present in each column.

In [None]:
data.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


***Missing Values Analysis***

data.isna().sum() provides a convenient way to quickly determine the number of missing values in each column of the DataFrame, which is essential for data preprocessing and quality assessment.

In [None]:
data.isna().sum()

CRIM       0
ZN         0
INDUS      0
CHAS       0
NOX        0
RM         5
AGE        0
DIS        0
RAD        0
TAX        0
PTRATIO    0
B          0
LSTAT      0
MEDV       0
dtype: int64

***Parameter Extraction and Statistical***

The data, extract model parameters and their corresponding p-values, and provide a summary of the model's statistical properties. This helps in understanding the relationships between predictor variables and the response variable, as well as assessing the significance of these relationships.

In [None]:
model=GLM(Y,X,family=sm.families.Gaussian(sm.families.links.log()))
trained_model=model.fit()
trained_model.summary()

w=round(trained_model.params,2)
pvalues=round(trained_model.pvalues,3)

pvalues

const      0.000
CRIM       1.000
ZN         0.019
INDUS      0.000
CHAS       0.811
NOX        0.000
RM         0.296
AGE        0.727
DIS        0.202
RAD        0.000
TAX        0.000
PTRATIO    0.007
B          0.495
LSTAT      0.585
MEDV       0.937
dtype: float64

**Reducing Predictor Variables:** The code selects a subset of predictor variables from the original dataset (X) and produces a new DataFrame named X_red that only contains the selected variables.

A GLM model is created and fitted using a Gaussian family and a logarithmic link function. The model is then fitted to the response variable (Y) and a subset of predictor variables (X_red).

**Parameter and p-value. Extraction:** The trained model's parameters (coefficients) and accompanying p-values are extracted. These numbers are rounded to two decimal places for easier reading.

**Creating a DataFrame:** The retrieved parameters and p-values are arranged into a DataFrame for subsequent analysis or interpretation. The DataFrame columns are designated 'weights' for model parameters and 'pvalues' for their respective p-values.



In [None]:
X_red=X[['const','CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT','MEDV']]
model_red= GLM(Y, X_red, family=sm.families.Gaussian(sm.families.links.log()))
trained_model_red = model_red.fit()
w=round(trained_model_red.params,2)
pvalues=round(trained_model_red.pvalues,3)
df=transpose(DataFrame([w,pvalues]))
df.columns=['weights','pvalues']
df

Unnamed: 0,weights,pvalues
const,5.02,0.0
CRIM,-0.0,1.0
ZN,-0.0,0.019
INDUS,-0.0,0.0
CHAS,-0.0,0.811
NOX,0.11,0.0
RM,-0.0,0.296
AGE,0.0,0.727
DIS,0.0,0.202
RAD,-0.0,0.0


***Training GLM and Evaluating Predictions***

Here splitting the data into training and testing sets using the train_test_split function from scikit-learn, fitting a Generalized Linear Model (GLM) to the training set, making predictions on the test set, and then organizing the actual and predicted values into a DataFrame for evaluation and comparison.

In [None]:
from sklearn.model_selection import train_test_split
X=add_constant(X)

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,random_state=22)


model_train=GLM(Y_train,X_train, family=sm.families.Gaussian(sm.families.links.log()))

f_model=model_train.fit()


p_full=f_model.predict(X_test)
d=DataFrame([Y_test,p_full]).transpose()
d.columns=['Actual','Prediction']
d

Unnamed: 0,Actual,Prediction
508,320.0,306.895550
183,193.0,234.331737
499,391.0,373.269132
227,307.0,301.735773
29,307.0,304.125459
...,...,...
439,666.0,670.718676
481,666.0,658.914096
509,329.0,305.212001
382,666.0,667.344906


***Root Mean Squared Error (RMSE) for Regression Models***

1. RMSE measures the average magnitude of the errors between predicted and actual values. It provides a single number to summarize the performance of the regression model, with lower values indicating better performance.

2. The square root is taken to ensure that the RMSE is in the same units as the original target variable, making it easier to interpret.
Overall, the RMSE serves as a useful metric for assessing the accuracy of regression models, providing insight into how well the model's predictions align with the actual data.




In [None]:


from sklearn.metrics import mean_squared_error
from math import sqrt
rmse=sqrt(mean_squared_error(Y_test,p_full))
rmse

19.641458753560965

In [None]:
!pip install imbalanced-learn



***Class Distribution Counting with Counter***

1. import imblearn: This library provides tools for dealing with imbalanced datasets, which are datasets where the classes are not represented equally.
from collections import Counter: This imports the Counter class from the collections module, which is used to count the occurrences of elements in a collection, such as a list or a dictionary.

2. Counting Class Distribution:
Counter(X): This counts the occurrences of each unique element in the dataset X. If X is a list or an array-like object, Counter(X) returns a dictionary-like object where keys are the unique elements in X and values are their respective counts.


In [None]:
import imblearn
from collections import Counter
Counter(X)

Counter({'const': 1,
         'CRIM': 1,
         'ZN': 1,
         'INDUS': 1,
         'CHAS': 1,
         'NOX': 1,
         'RM': 1,
         'AGE': 1,
         'DIS': 1,
         'RAD': 1,
         'TAX': 1,
         'PTRATIO': 1,
         'B': 1,
         'LSTAT': 1,
         'MEDV': 1})

In [None]:


from sklearn.metrics import accuracy_score, recall_score, confusion_matrix
from sklearn.linear_model import LogisticRegression
pred=p_full


print(pred)

508    306.895550
183    234.331737
499    373.269132
227    301.735773
29     304.125459
          ...    
439    670.718676
481    658.914096
509    305.212001
382    667.344906
446    671.479175
Length: 102, dtype: float64


In [None]:
from imblearn.under_sampling import RandomUnderSampler
undersample = RandomUnderSampler(sampling_strategy='majority')
X_under= undersample.fit(X_train,Y_train)
Y_under= undersample.fit(X_train,Y_train)

print(Counter(Y_train))
print(Counter(Y_test))


Counter({666: 105, 307: 33, 403: 24, 304: 11, 437: 11, 264: 9, 384: 9, 330: 9, 398: 9, 276: 8, 277: 8, 432: 8, 296: 7, 224: 6, 287: 6, 270: 6, 222: 6, 329: 6, 233: 6, 311: 6, 188: 5, 216: 5, 391: 5, 284: 5, 193: 5, 273: 5, 254: 4, 300: 4, 247: 4, 243: 4, 289: 4, 223: 3, 345: 3, 430: 3, 281: 3, 245: 3, 711: 3, 358: 3, 305: 3, 370: 2, 348: 2, 279: 2, 320: 2, 352: 2, 265: 2, 315: 2, 252: 2, 335: 2, 402: 2, 337: 2, 187: 1, 244: 1, 242: 1, 280: 1, 334: 1, 293: 1, 411: 1, 256: 1, 313: 1, 241: 1, 198: 1, 255: 1, 226: 1, 469: 1, 285: 1})
Counter({666: 27, 307: 7, 403: 6, 224: 4, 193: 3, 391: 3, 304: 3, 437: 3, 398: 3, 264: 3, 233: 3, 277: 3, 300: 3, 188: 2, 223: 2, 711: 2, 293: 2, 287: 2, 384: 2, 320: 1, 422: 1, 242: 1, 330: 1, 411: 1, 222: 1, 254: 1, 289: 1, 284: 1, 281: 1, 279: 1, 305: 1, 334: 1, 351: 1, 270: 1, 432: 1, 296: 1, 345: 1, 329: 1})


In [None]:
from sklearn.metrics import accuracy_score, recall_score, confusion_matrix
import numpy as np

print(Y_test,pred)
Y_test = np.array(Y_test)
pred = np.array(pred)
assert len(Y_test) == len(pred)
unique_labels_Y_test = np.unique(Y_test)
unique_labels_pred = np.unique(pred)

print("Unique labels in Y_test:", unique_labels_Y_test)
print("Unique labels in pred:", unique_labels_pred)

try:
    acc = confusion_matrix(Y_test, pred)
    print("Confusion matrix:", acc)
except ValueError as ve:
    print("ValueError occurred:", ve)
# print(len(Y_train),"*******")
# print(len(pred),"&&&0000")
# acc=confusion_matrix(Y_test,pred)

508    320
183    193
499    391
227    307
29     307
      ... 
439    666
481    666
509    329
382    666
446    666
Name: TAX, Length: 102, dtype: int64 508    306.895550
183    234.331737
499    373.269132
227    301.735773
29     304.125459
          ...    
439    670.718676
481    658.914096
509    305.212001
382    667.344906
446    671.479175
Length: 102, dtype: float64
Unique labels in Y_test: [188 193 222 223 224 233 242 254 264 270 277 279 281 284 287 289 293 296
 300 304 305 307 320 329 330 334 345 351 384 391 398 403 411 422 432 437
 666 711]
Unique labels in pred: [216.86397728 216.95305061 233.99466076 234.16291202 234.33173673
 245.42379068 247.02900475 249.07952979 249.10997844 249.88923507
 250.19070672 252.59138449 254.92793834 256.56395006 256.77114638
 262.53646724 263.19171266 268.6750375  278.59511937 279.00208449
 280.25077801 280.7000122  280.86155074 281.48507472 282.05564106
 284.16275553 285.21176083 288.89899789 289.15594403 289.38720968
 289.47536403 29

In [None]:
X=dataset.drop('MEDV',axis=1)
y=dataset['MEDV']

X.shape

(506, 13)

# **SVM MODEL**


1. **SVR Initialization and Training:** A radial basis function (RBF) kernel is used to initialize the SVR regressor, which is then trained using training data.

2. **Making Predictions:**The trained SVR model is utilized to generate predictions on the test data.

3. **Model Evaluation:** The mean squared error (MSE) is used to assess the performance of the SVR model by comparing predicted values to actual values of the target variable.

In [None]:
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

In [None]:
svm_regressor = SVR(kernel='rbf')
svm_regressor.fit(X_train, Y_train)
y_pred = svm_regressor.predict(X_test)
print(y_pred)

[331.75991012 301.43637105 348.72912062 325.86873212 325.7215895
 425.41072551 348.42158921 358.19369201 324.4475329  353.24861579
 302.25118659 426.02573621 363.72806211 353.33073112 356.07068894
 323.94556694 362.51929978 337.46587375 309.28268966 352.1678094
 427.76421363 313.69530169 325.10366284 426.85860589 301.10164131
 300.54662276 307.71901639 331.06956331 426.16705245 354.81796608
 305.11364679 364.02749259 317.48405511 427.76955908 325.54085697
 425.83976544 426.10348014 433.62817861 320.03314176 305.19772919
 318.85584243 426.08187821 427.00422004 425.98734336 432.92543176
 322.31388234 354.86091131 372.94934803 319.10034092 425.20078772
 347.91612776 311.51417835 322.39002485 426.11919313 426.13447572
 305.64569262 319.7560087  305.50816008 319.08045605 318.50171327
 317.09425737 317.01194915 325.66366266 321.8280197  305.80110637
 426.14355915 426.43917506 427.03028562 359.65115951 349.54092196
 431.9044794  305.55414277 314.24134324 323.78555299 314.73970175
 324.3925948

In [None]:
mse = mean_squared_error(Y_test, y_pred)
print("Mean Squared Error:", mse)

Mean Squared Error: 18801.69395533629


In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.impute import SimpleImputer
from sklearn.metrics import mean_squared_error

X = data.drop(columns=['TAX'])
y = data['TAX']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)
svm_regressor = SVR()
svm_regressor.fit(X_train_imputed, y_train)
y_pred = svm_regressor.predict(X_test_imputed)




# **DECISON TREE MODEL**

1. **Data Loading and Preparation:** The dataset is imported from a CSV file into a pandas DataFrame. The characteristics (X) and the target variable (TAX) are separated.

2. **Data Splitting:** The dataset is divided into training and testing sets using scikit-learn's train_test_split method.

3. **Missing values** in the features are handled using SimpleImputer's mean imputation method.

4. **Model Training and Prediction:** Two distinct regressors are developed and utilized for prediction:

    * A decision tree regression model is created and trained using the training data.
    * DecisionTreeRegressor (2nd instance): A new decision tree regression model is created and trained using the imputed training data.
    
5. **Model Evaluation:** The Mean Squared Error (MSE) is used to assess the difference between the anticipated and actual 'TAX' values for both models. The MSE is printed to the console and evaluated.

In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVC

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.impute import SimpleImputer
from sklearn.metrics import mean_squared_error

X = data.drop(columns=['TAX'])
y = data['TAX']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)

tree_regressor = DecisionTreeRegressor()

tree_regressor.fit(X_train_imputed, y_train)

y_pred = tree_regressor.predict(X_test_imputed)

mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
print(classification_report(y_test, y_pred))



Mean Squared Error: 676.4077669902913
              precision    recall  f1-score   support

       188.0       1.00      1.00      1.00         1
       193.0       1.00      1.00      1.00         2
       216.0       0.50      0.50      0.50         2
       224.0       0.50      1.00      0.67         1
       226.0       0.00      0.00      0.00         1
       233.0       1.00      1.00      1.00         1
       241.0       0.00      0.00      0.00         1
       242.0       1.00      1.00      1.00         1
       245.0       0.00      0.00      0.00         0
       247.0       1.00      1.00      1.00         1
       252.0       1.00      1.00      1.00         1
       254.0       1.00      0.50      0.67         2
       264.0       1.00      1.00      1.00         1
       270.0       1.00      1.00      1.00         2
       273.0       1.00      1.00      1.00         2
       276.0       0.50      1.00      0.67         1
       281.0       0.50      1.00      0.67

# **NAIVE BAYES MODEL**

1. **Data Loading and Preparation**: The dataset is imported from a CSV file into a pandas DataFrame. The characteristics (X) and the target variable (TAX) are separated.

2. **Data Splitting**: The dataset is divided into training and testing sets using scikit-learn's train_test_split method. This allows models to be trained on one group of data while being evaluated on another.

3. **Mean Imputation is used to manage missing values in features**. The SimpleImputer from scikit-learn is used to replace missing values with the average of each feature in the training set.

4. **Model training:** involves initializing and training a Gaussian Naive Bayes regressor on training data. The fit approach is used to train the model using imputed training features (X_train_imputed) and the target variable (y_train).

5. **Predictions:** The trained model is utilized to make predictions on the test set using the predict method. The y_pred variable stores predictions for the 'TAX' variable.

6. **Model Evaluation:** The Mean Squared Error (MSE) is used to assess the difference between predicted and real 'TAX' values. In addition, a confusion matrix and a classification report are created to provide detailed evaluation metrics including precision, recall, and F1-score.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.impute import SimpleImputer
from sklearn.metrics import mean_squared_error
from sklearn.metrics import classification_report


X = data.drop(columns=['TAX'])
y = data['TAX']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)
nb_regressor = GaussianNB()
nb_regressor.fit(X_train_imputed, y_train)
y_pred = nb_regressor.predict(X_test_imputed)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(classification_report(y_test, y_pred))


Mean Squared Error: 1002.495145631068
Confusion Matrix:
[[ 1  0  0 ...  0  0  0]
 [ 0  2  0 ...  0  0  0]
 [ 0  0  1 ...  0  0  0]
 ...
 [ 0  0  0 ...  3  0  0]
 [ 0  0  0 ...  0 28  0]
 [ 0  0  0 ...  0  0  1]]
Classification Report:
              precision    recall  f1-score   support

         188       1.00      1.00      1.00         1
         193       1.00      1.00      1.00         2
         216       1.00      0.50      0.67         2
         224       1.00      1.00      1.00         1
         226       0.00      0.00      0.00         1
         233       1.00      1.00      1.00         1
         241       0.00      0.00      0.00         1
         242       0.00      0.00      0.00         1
         247       1.00      1.00      1.00         1
         252       0.00      0.00      0.00         1
         254       1.00      1.00      1.00         2
         264       1.00      1.00      1.00         1
         270       1.00      1.00      1.00         2
        