**Importing Required Library**

In [18]:
import numpy as np
import pandas as pd

# **LOADING DATASET**

In [19]:
data=pd.read_csv('tidalfnal.csv')
data

Unnamed: 0,Age,Height (cm),Weight (kg),Gender,Smoking,Having Lung Disease,Tidal Volume (mL)
0,23,168,70,Male,No,No,500
1,45,162,65,Female,Yes,No,400
2,34,175,80,Male,No,No,530
3,29,160,58,Female,No,No,450
4,50,170,78,Male,Yes,Yes,420
...,...,...,...,...,...,...,...
512,57,158,64,Female,No,No,370
513,60,155,61,Female,No,Yes,360
514,55,158,61,Female,Yes,No,370
515,59,155,61,Female,Yes,No,360


In [20]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 517 entries, 0 to 516
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Age                  517 non-null    int64 
 1   Height (cm)          517 non-null    int64 
 2   Weight (kg)          517 non-null    int64 
 3   Gender               517 non-null    object
 4   Smoking              517 non-null    object
 5   Having Lung Disease  517 non-null    object
 6   Tidal Volume (mL)    517 non-null    int64 
dtypes: int64(4), object(3)
memory usage: 28.4+ KB


In [21]:
data.describe()

Unnamed: 0,Age,Height (cm),Weight (kg),Tidal Volume (mL)
count,517.0,517.0,517.0,517.0
mean,48.77176,164.595745,67.93617,419.941973
std,11.573415,8.739169,8.193031,74.27033
min,23.0,155.0,58.0,350.0
25%,36.0,158.0,61.0,370.0
50%,55.0,160.0,64.0,370.0
75%,58.0,174.0,75.0,510.0
max,62.0,182.0,85.0,550.0


# **UNDERSTANDING THE DATASET AND TAKING THE INFERENCE**

Using plotly library because it gives interactive graph which matplotlib cannot

In [22]:
import plotly.express as px
fig = px.histogram(data,
                   x='Age',
                   marginal='box',
                   title='Distribution of Age')
fig.update_layout(bargap=0.5)
fig.show()

In [23]:
fig = px.histogram(data,
                   x='Height (cm)',
                   marginal='box',
                   title='Distribution of Height')
fig.update_layout(bargap=0.5)
fig.show()

In [24]:
fig = px.histogram(data,
                   x='Tidal Volume (mL)',
                   marginal='box',
                   color='Gender',
                   title='Distribution of volume')
fig.update_layout(bargap=0.5)
fig.show()

In [25]:
data['Gender'].value_counts()

Gender
Female    358
Male      159
Name: count, dtype: int64

Female data is more. that can be a drawback that model is not able to predict accurate data for males

# **UNDERSTANDING THE RELATIONSHIP BETWEEN PARAMETERS**

In [26]:
fig = px.scatter(data,
                 x='Age',
                 y='Tidal Volume (mL)',
                 color='Gender',
                 opacity=0.5,
                 hover_data=['Smoking'],
                 title='Age vs. Volume')
fig.update_traces(marker_size=5)
fig.show()

By looking the plot it is clear that young generation has high tidal volume than the old generation which makes sense also. but dataset is very non uniform we have the dataset of men of younger age and females of more age. We can see a linear relation between age and volume

In [27]:
fig = px.scatter(data,
                 x='Weight (kg)',
                 y='Tidal Volume (mL)',
                 color='Smoking',
                 opacity=0.5,
                 hover_data=['Gender'],
                 title='Weight vs. volume')
fig.update_traces(marker_size=5)
fig.show()

Same as age weight also has a linear relation with volume

In [28]:
fig = px.scatter(data,
                 x='Height (cm)',
                 y='Tidal Volume (mL)',
                 color='Smoking',
                 opacity=0.5,
                 hover_data=['Gender'],
                 title='Height vs. volume')
fig.update_traces(marker_size=5)
fig.show()

By the above analysis best model most likly will be linear regression

# **CONVERTING CATEGORICAL DATA TO INT TYPE**

In [29]:
smoker_codes = {'No': 0, 'Yes': 1}
data['Smoker_code'] = data["Smoking"].map(smoker_codes)

In [30]:
disease_codes = {'No': 0, 'Yes': 1}
data['disease'] = data["Having Lung Disease"].map(disease_codes)

In [31]:
Gender_codes = {'Female': 0, 'Male': 1}
data['Gender_codes'] = data["Gender"].map(Gender_codes)

In [32]:
data.describe()

Unnamed: 0,Age,Height (cm),Weight (kg),Tidal Volume (mL),Smoker_code,disease,Gender_codes
count,517.0,517.0,517.0,517.0,517.0,517.0,517.0
mean,48.77176,164.595745,67.93617,419.941973,0.365571,0.261122,0.307544
std,11.573415,8.739169,8.193031,74.27033,0.482056,0.439672,0.461923
min,23.0,155.0,58.0,350.0,0.0,0.0,0.0
25%,36.0,158.0,61.0,370.0,0.0,0.0,0.0
50%,55.0,160.0,64.0,370.0,0.0,0.0,0.0
75%,58.0,174.0,75.0,510.0,1.0,1.0,1.0
max,62.0,182.0,85.0,550.0,1.0,1.0,1.0


In [33]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 517 entries, 0 to 516
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Age                  517 non-null    int64 
 1   Height (cm)          517 non-null    int64 
 2   Weight (kg)          517 non-null    int64 
 3   Gender               517 non-null    object
 4   Smoking              517 non-null    object
 5   Having Lung Disease  517 non-null    object
 6   Tidal Volume (mL)    517 non-null    int64 
 7   Smoker_code          517 non-null    int64 
 8   disease              517 non-null    int64 
 9   Gender_codes         517 non-null    int64 
dtypes: int64(7), object(3)
memory usage: 40.5+ KB


# **BUILDING LINEAR REGRESSION FROM SCRATCH**

**MAKING TRAIN TEST AND VALIDATION DATA**

In [34]:
input,target=data[['Age','Height (cm)','Weight (kg)','Smoker_code','Gender_codes','disease']].values,data['Tidal Volume (mL)'].values

In [35]:
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(input,target,test_size=0.1,random_state=42)
print(xtrain.shape)
print(xtest.shape)
print(ytrain.shape)
print(ytest.shape)

(465, 6)
(52, 6)
(465,)
(52,)


In [36]:
xtrain,xval,ytrain,yval=train_test_split(xtrain,ytrain,test_size=0.1,random_state=42)
print(xtrain.shape)
print(xval.shape)
print(ytrain.shape)
print(yval.shape)

(418, 6)
(47, 6)
(418,)
(47,)


**APPLING FEATURE SCALLING TO REQUIRED COLUMNS**

In [37]:
test=xtrain[:,0:3]
test

array([[ 57, 162,  67],
       [ 54, 158,  61],
       [ 55, 162,  64],
       ...,
       [ 50, 157,  61],
       [ 35, 179,  80],
       [ 59, 161,  62]])

In [38]:
xtrain[:,3:]

array([[0, 0, 1],
       [0, 0, 1],
       [0, 0, 1],
       ...,
       [1, 0, 0],
       [0, 1, 0],
       [1, 0, 0]])

In [39]:
mean=np.mean(test,axis=0)
std=np.std(test,axis=0)
print(mean)
print(std)

[ 48.92344498 164.35406699  67.73923445]
[11.4021267   8.69768683  8.17227437]


In [40]:
test=(test-mean)/std
test

array([[ 0.7083376 , -0.27065437, -0.09045639],
       [ 0.44522879, -0.73054677, -0.82464613],
       [ 0.53293172, -0.27065437, -0.45755126],
       ...,
       [ 0.09441704, -0.84551986, -0.82464613],
       [-1.22112702,  1.68388829,  1.50028805],
       [ 0.88374347, -0.38562747, -0.70228118]])

In [41]:
xtrain=np.concatenate((test,xtrain[:,3:]),axis=1)

xtrain

array([[ 0.7083376 , -0.27065437, -0.09045639,  0.        ,  0.        ,
         1.        ],
       [ 0.44522879, -0.73054677, -0.82464613,  0.        ,  0.        ,
         1.        ],
       [ 0.53293172, -0.27065437, -0.45755126,  0.        ,  0.        ,
         1.        ],
       ...,
       [ 0.09441704, -0.84551986, -0.82464613,  1.        ,  0.        ,
         0.        ],
       [-1.22112702,  1.68388829,  1.50028805,  0.        ,  1.        ,
         0.        ],
       [ 0.88374347, -0.38562747, -0.70228118,  1.        ,  0.        ,
         0.        ]])

In [42]:
print(xtrain)

[[ 0.7083376  -0.27065437 -0.09045639  0.          0.          1.        ]
 [ 0.44522879 -0.73054677 -0.82464613  0.          0.          1.        ]
 [ 0.53293172 -0.27065437 -0.45755126  0.          0.          1.        ]
 ...
 [ 0.09441704 -0.84551986 -0.82464613  1.          0.          0.        ]
 [-1.22112702  1.68388829  1.50028805  0.          1.          0.        ]
 [ 0.88374347 -0.38562747 -0.70228118  1.          0.          0.        ]]


In [43]:
test2=xval[:,0:3]
test2

array([[ 59, 157,  60],
       [ 60, 161,  65],
       [ 58, 158,  61],
       [ 41, 181,  82],
       [ 35, 181,  84],
       [ 58, 164,  66],
       [ 53, 166,  65],
       [ 52, 159,  62],
       [ 57, 155,  62],
       [ 54, 160,  61],
       [ 32, 174,  75],
       [ 28, 176,  79],
       [ 60, 162,  64],
       [ 59, 155,  61],
       [ 50, 170,  78],
       [ 31, 172,  74],
       [ 58, 162,  67],
       [ 35, 177,  81],
       [ 51, 157,  63],
       [ 57, 163,  69],
       [ 60, 158,  63],
       [ 60, 158,  65],
       [ 39, 180,  80],
       [ 59, 158,  60],
       [ 54, 155,  63],
       [ 56, 162,  67],
       [ 60, 160,  64],
       [ 29, 175,  77],
       [ 60, 158,  64],
       [ 55, 155,  60],
       [ 50, 157,  62],
       [ 36, 181,  84],
       [ 34, 172,  74],
       [ 58, 155,  60],
       [ 60, 167,  72],
       [ 58, 157,  60],
       [ 32, 178,  80],
       [ 25, 169,  71],
       [ 60, 158,  62],
       [ 41, 174,  75],
       [ 57, 159,  63],
       [ 58, 161

In [44]:
test2=(test2-mean)/std
test2

array([[ 0.88374347, -0.84551986, -0.94701109],
       [ 0.97144641, -0.38562747, -0.33518631],
       [ 0.79604053, -0.73054677, -0.82464613],
       [-0.6949094 ,  1.91383449,  1.74501796],
       [-1.22112702,  1.91383449,  1.98974787],
       [ 0.79604053, -0.04070818, -0.21282135],
       [ 0.35752585,  0.18923802, -0.33518631],
       [ 0.26982291, -0.61557367, -0.70228118],
       [ 0.7083376 , -1.07546606, -0.70228118],
       [ 0.44522879, -0.50060057, -0.82464613],
       [-1.48423583,  1.1090228 ,  0.88846326],
       [-1.83504758,  1.338969  ,  1.37792309],
       [ 0.97144641, -0.27065437, -0.45755126],
       [ 0.88374347, -1.07546606, -0.82464613],
       [ 0.09441704,  0.64913041,  1.25555813],
       [-1.57193877,  0.87907661,  0.76609831],
       [ 0.79604053, -0.27065437, -0.09045639],
       [-1.22112702,  1.4539421 ,  1.622653  ],
       [ 0.18211997, -0.84551986, -0.57991622],
       [ 0.7083376 , -0.15568128,  0.15427352],
       [ 0.97144641, -0.73054677, -0.579

In [45]:
xval=np.concatenate((test2,xval[:,3:]),axis=1)

In [46]:
xval

array([[ 0.88374347, -0.84551986, -0.94701109,  1.        ,  0.        ,
         0.        ],
       [ 0.97144641, -0.38562747, -0.33518631,  1.        ,  0.        ,
         0.        ],
       [ 0.79604053, -0.73054677, -0.82464613,  1.        ,  0.        ,
         0.        ],
       [-0.6949094 ,  1.91383449,  1.74501796,  0.        ,  1.        ,
         0.        ],
       [-1.22112702,  1.91383449,  1.98974787,  0.        ,  1.        ,
         0.        ],
       [ 0.79604053, -0.04070818, -0.21282135,  0.        ,  0.        ,
         0.        ],
       [ 0.35752585,  0.18923802, -0.33518631,  0.        ,  0.        ,
         0.        ],
       [ 0.26982291, -0.61557367, -0.70228118,  0.        ,  0.        ,
         1.        ],
       [ 0.7083376 , -1.07546606, -0.70228118,  0.        ,  0.        ,
         1.        ],
       [ 0.44522879, -0.50060057, -0.82464613,  0.        ,  0.        ,
         0.        ],
       [-1.48423583,  1.1090228 ,  0.88846326,  0.

In [47]:
test3=xtest[:,0:3]
test3

array([[ 59, 161,  65],
       [ 58, 160,  64],
       [ 58, 158,  64],
       [ 29, 174,  75],
       [ 55, 158,  61],
       [ 53, 158,  60],
       [ 57, 161,  61],
       [ 57, 158,  64],
       [ 55, 160,  62],
       [ 27, 181,  84],
       [ 55, 160,  65],
       [ 59, 157,  60],
       [ 31, 179,  81],
       [ 34, 175,  78],
       [ 35, 178,  77],
       [ 60, 161,  61],
       [ 33, 178,  85],
       [ 38, 181,  80],
       [ 33, 178,  80],
       [ 26, 174,  75],
       [ 31, 177,  80],
       [ 60, 162,  67],
       [ 52, 159,  62],
       [ 31, 180,  84],
       [ 60, 155,  60],
       [ 56, 162,  65],
       [ 30, 181,  81],
       [ 57, 155,  61],
       [ 27, 180,  83],
       [ 56, 157,  63],
       [ 55, 162,  67],
       [ 56, 158,  60],
       [ 30, 174,  75],
       [ 49, 158,  63],
       [ 25, 169,  70],
       [ 57, 160,  61],
       [ 56, 160,  62],
       [ 59, 159,  64],
       [ 59, 160,  62],
       [ 54, 157,  62],
       [ 59, 158,  60],
       [ 38, 174

In [48]:
test3=(test3-mean)/std
test3

array([[ 0.88374347, -0.38562747, -0.33518631],
       [ 0.79604053, -0.50060057, -0.45755126],
       [ 0.79604053, -0.73054677, -0.45755126],
       [-1.74734464,  1.1090228 ,  0.88846326],
       [ 0.53293172, -0.73054677, -0.82464613],
       [ 0.35752585, -0.73054677, -0.94701109],
       [ 0.7083376 , -0.38562747, -0.82464613],
       [ 0.7083376 , -0.73054677, -0.45755126],
       [ 0.53293172, -0.50060057, -0.70228118],
       [-1.92275051,  1.91383449,  1.98974787],
       [ 0.53293172, -0.50060057, -0.33518631],
       [ 0.88374347, -0.84551986, -0.94701109],
       [-1.57193877,  1.68388829,  1.622653  ],
       [-1.30882995,  1.2239959 ,  1.25555813],
       [-1.22112702,  1.56891519,  1.13319318],
       [ 0.97144641, -0.38562747, -0.82464613],
       [-1.39653289,  1.56891519,  2.11211283],
       [-0.95801821,  1.91383449,  1.50028805],
       [-1.39653289,  1.56891519,  1.50028805],
       [-2.01045345,  1.1090228 ,  0.88846326],
       [-1.57193877,  1.4539421 ,  1.500

In [49]:
xtest=np.concatenate((test3,xtest[:,3:]),axis=1)
xtest

array([[ 0.88374347, -0.38562747, -0.33518631,  1.        ,  0.        ,
         0.        ],
       [ 0.79604053, -0.50060057, -0.45755126,  0.        ,  0.        ,
         1.        ],
       [ 0.79604053, -0.73054677, -0.45755126,  1.        ,  0.        ,
         0.        ],
       [-1.74734464,  1.1090228 ,  0.88846326,  0.        ,  1.        ,
         0.        ],
       [ 0.53293172, -0.73054677, -0.82464613,  0.        ,  0.        ,
         1.        ],
       [ 0.35752585, -0.73054677, -0.94701109,  0.        ,  0.        ,
         1.        ],
       [ 0.7083376 , -0.38562747, -0.82464613,  1.        ,  0.        ,
         0.        ],
       [ 0.7083376 , -0.73054677, -0.45755126,  0.        ,  0.        ,
         0.        ],
       [ 0.53293172, -0.50060057, -0.70228118,  0.        ,  0.        ,
         0.        ],
       [-1.92275051,  1.91383449,  1.98974787,  0.        ,  1.        ,
         0.        ],
       [ 0.53293172, -0.50060057, -0.33518631,  0.

Feature scaling is completed so model will not be confused with large value of some parameters and low value of other

 **MODEL MAKING AND TRAINING **

In [50]:
def compute_cost(X, y, w, b):
    m = X.shape[0]
    cost = 0.0
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b
        cost = cost + (f_wb_i - y[i])**2
    cost = cost / (2 * m)
    return cost

In [51]:
winit=np.zeros(6)
binit=0
cost = compute_cost(xtrain, ytrain, winit, binit)
print(cost)

90289.59330143541


In [52]:
def compute_gradient(X, y, w, b):
    m,n = X.shape
    dw = np.zeros((n,))
    db = 0.

    for i in range(m):
        diff = (np.dot(X[i], w) + b) - y[i]
        for j in range(n):
            dw[j] = dw[j] + diff * X[i, j]
        db = db + diff
    dw = dw/m
    db = db/m

    return db, dw

In [53]:
def gradient_descent(X, y, win, bin, alpha, num_iters):
    history = []
    w = win
    b = bin

    for i in range(num_iters):
        db,dw = compute_gradient(X, y, w, b)
        w = w - alpha*dw
        b = b - alpha*db
        if i<100000:
            history.append( compute_cost(X, y, w, b))
        if i% 100 == 0:
            print(compute_cost(X, y, w, b))

    return w, b, history

In [54]:
w_final, b_final,hist = gradient_descent(xtrain, ytrain, winit, binit,0.6, 1000)
print(w_final)
print(b_final)

7082.521169701887
60.626880456831366
54.98602689111693
53.23569015740814
52.68645151530912
52.513948115914644
52.459762599554075
52.44274200150885
52.4373955311943
52.4357161101308
[-14.42290671  13.9059368   20.50251955  -7.11183283  51.03988469
  -8.24124183]
408.2513962129751


In [55]:
pred=np.dot(xtrain,w_final)+b_final
def rmse(targets, predictions):
    return np.sqrt(np.mean(np.square(targets - predictions)))
rmse(ytrain,pred)

10.240624139977799

In [56]:
pred2=np.dot(xval,w_final)+b_final
rmse(yval,pred2)

13.856317804451933

In [57]:
pred3=np.dot(xtest,w_final)+b_final
rmse(ytest,pred3)

8.056788841359182

# **COMBINING ALL INTO A CLASS**

In [58]:
class linearRegression:
  def __init__(self):
    pass
  def train(self,xtrain,ytrain,alpha,num_iters):
    def compute_cost(X, y, w, b):
      m = X.shape[0]
      cost = 0.0
      for i in range(m):
          f_wb_i = np.dot(X[i], w) + b
          cost = cost + (f_wb_i - y[i])**2
      cost = cost / (2 * m)
      return cost
    winit=np.zeros(6)
    binit=0
    def compute_gradient(X, y, w, b):
      m,n = X.shape
      dw = np.zeros((n,))
      db = 0.

      for i in range(m):
          diff = (np.dot(X[i], w) + b) - y[i]
          for j in range(n):
              dw[j] = dw[j] + diff * X[i, j]
          db = db + diff
      dw = dw/m
      db = db/m

      return db, dw
      def gradient_descent(X, y, win, bin, alpha, num_iters):
        history = []
        w = win
        b = bin

        for i in range(num_iters):
            db,dw = compute_gradient(X, y, w, b)
            w = w - alpha*dw
            b = b - alpha*db
            if i<100000:
                history.append( compute_cost(X, y, w, b))
        return w, b
      w_final, b_final,hist = gradient_descent(xtrain, ytrain, winit, binit,0.6, 1000)
      return w_final,b_final
  def predict(self,xtest):
    pred=np.dot(xtest,w_final)+b_final
    return pred

In [59]:
model=linearRegression()

In [60]:
model.train(xtrain,ytrain,0.6,1000)

In [61]:
pred3=model.predict(xtrain)

In [62]:
rmse(ytrain,pred3)

10.240624139977799

In [63]:
!pip install gradio

Collecting gradio
  Downloading gradio-4.31.5-py3-none-any.whl (12.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m78.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.111.0-py3-none-any.whl (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.3.2.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client==0.16.4 (from gradio)
  Downloading gradio_client-0.16.4-py3-none-any.whl (315 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.9/315.9 kB[0m [31m36.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━

In [64]:
import gradio as gr
def predict(input_feature1, input_feature2, input_feature3, input_feature4, input_feature5, input_feature6):
    input_array = np.array([(input_feature1-mean[0])/std[0], (input_feature2-mean[1])/std[1], (input_feature3-mean[2])/std[2], input_feature4, input_feature5, input_feature6]).reshape(1, -1)
    prediction = model.predict(input_array)
    return prediction[0]
interface = gr.Interface(
    fn=predict,
    inputs=[gr.Number(label="Age"),
            gr.Number(label="Height(cm)"),
            gr.Number(label="Weight(kg)"),
            gr.Number(label="Gender_code(0:female,1:male)"),
            gr.Number(label="Smoking(0:NO,1:Yes)"),
            gr.Number(label="Lung_Disease(0:NO,1:Yes)")],
    outputs=gr.Textbox(label="Prediction")
)
if __name__ == "__main__":
    interface.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://35b222326a36a9c1ad.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
