# Module 1 Project

## Project overview


## Select a problem based on your specialization in ECE:



**Communications/electronics engineering**: Antenna design

_Dataset:_ `project_datasets/comm_antenna.csv`

_Dataset description:_ This is a dataset of different antenna designs. Columns:
1. TestFreq (frequency used for testing the signal strength)
2. PatchLength (length of patch antenna in mm)
3. PatchWidth (width of patch antenna in mm)
4. SlotLength (length of slot in antenna in mm)
5. SlotWidth (width of slot in antenna in mm)
6. Strength (signal strength in dB, higher is better)

_Problem:_ Is it possible to create a statistical model that can estimate signal strength based on these parameters? Additionally, is it possible to create a model that only use the parameters that are not the test frequency?  What are the best accuracies of your statistical models?


Part a: Is it possible to create a statistical model that can estimate signal strength based on these parameters?

Answer: Yes, Is it possible to create a statistical model that can estimate signal strength based on these parameters.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression, ElasticNet, Lasso
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

In [3]:
#reading dataset
data = pd.read_csv('/content/drive/MyDrive/Dataset/comm_antenna.csv')

In [4]:
data.head()

Unnamed: 0,TestFreq,PatchLength,PatchWidth,SlotLength,SlotWidth,Strength
0,1.5,33.0,33,0.0,0,-4.927274
1,1.551724,33.0,33,0.0,0,-5.077877
2,1.603448,33.0,33,0.0,0,-5.183708
3,1.655172,33.0,33,0.0,0,-5.215997
4,1.706897,33.0,33,0.0,0,-5.120009


In [5]:
#Checking Number of unique values does our columns have 
data.nunique()

TestFreq        279
PatchLength       5
PatchWidth        5
SlotLength        6
SlotWidth         6
Strength       1266
dtype: int64

Part b: Additionally, is it possible to create a model that only use the parameters that are not the test frequency?

In [6]:
X1 = data.drop(['TestFreq','Strength'], axis=1)
y1 = data['Strength']
X_train1, X_test1, y_train1, y_test1 = train_test_split(X1, y1, test_size=0.3)

In [7]:
models = [LinearRegression, ElasticNet, Lasso, DecisionTreeRegressor, RandomForestRegressor]
for model in models:
 reg = model()
 reg.fit(X_train1,y_train1)
 pred1 = reg.predict(X_test1)
 err1 = mean_squared_error(y_test1, pred1) ** .5
 print(f'RMSE of {model.__name__} model is: {err1}')
 print(f'R2 value of {model.__name__} is: {np.mean(r2_score(y_test1, pred1))}')
 print('*'*50)

RMSE of LinearRegression model is: 2.676241886875816
R2 value of LinearRegression is: 0.28721183022604246
**************************************************
RMSE of ElasticNet model is: 2.7188484952980243
R2 value of ElasticNet is: 0.26433554803939263
**************************************************
RMSE of Lasso model is: 2.762681278365111
R2 value of Lasso is: 0.24042384111534953
**************************************************
RMSE of DecisionTreeRegressor model is: 2.6278146757974685
R2 value of DecisionTreeRegressor is: 0.31277456777177237
**************************************************
RMSE of RandomForestRegressor model is: 2.624496871014825
R2 value of RandomForestRegressor is: 0.314508815216624
**************************************************


Answer: Yes it's possible to create that kind of model but we will get too much bad results because the variable test frequency plays a vital role to estimate signal strength. For better understanding you can see bellow 2 cells for five regression model results and their R^2 values which give us a very clear picture how much important this attribute is for statical models.

What is the maximum accuracy of your statistical model?

In [8]:
'''Answer: Without considering the maximum accuracy is around 23% but when we consider TestFreq for essitmation of 
signal strength the accuracy reaches to 98% as shown in bellow cells'''

'Answer: Without considering the maximum accuracy is around 23% but when we consider TestFreq for essitmation of \nsignal strength the accuracy reaches to 98% as shown in bellow cells'

In [9]:
new_data = pd.read_csv('/content/drive/MyDrive/Dataset/comm_antenna.csv')

In [10]:
X = new_data.drop(['Strength'], axis=1)
y = new_data['Strength']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [11]:
models = [LinearRegression, ElasticNet, Lasso, DecisionTreeRegressor, RandomForestRegressor]
for model in models:
 reg = model()
 reg.fit(X_train,y_train)
 pred = reg.predict(X_test)
 err = mean_squared_error(y_test, pred) ** .5
 print(f'RMSE of {model.__name__} model is: {err}')
 print(f'R2 value of {model.__name__} is: {np.mean(r2_score(y_test, pred))}')
 print('*'*50)

RMSE of LinearRegression model is: 2.7611782684314625
R2 value of LinearRegression is: 0.25005662488174607
**************************************************
RMSE of ElasticNet model is: 2.7941026554043513
R2 value of ElasticNet is: 0.2320652935067815
**************************************************
RMSE of Lasso model is: 2.811382906304811
R2 value of Lasso is: 0.2225372690799633
**************************************************
RMSE of DecisionTreeRegressor model is: 0.9954226952977892
R2 value of DecisionTreeRegressor is: 0.9025336603929535
**************************************************
RMSE of RandomForestRegressor model is: 0.9306109888387203
R2 value of RandomForestRegressor is: 0.9148124883454711
**************************************************


In [12]:
''' DecisionTreeRegressor and RandomForestRegressor are the best models among the these five models. However RandomForestRegressor model resulted much better their error is pretty low. Moreover 90%+ R^2 is also a good value. '''

' DecisionTreeRegressor and RandomForestRegressor are the best models among the these five models. However RandomForestRegressor model resulted much better their error is pretty low. Moreover 90%+ R^2 is also a good value. '