# **COVID-19: Coronavirus Infection Probability using machine learning**

Coronavirus

Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans, these viruses cause respiratory tract infections that can range from mild to lethal. Mild illnesses include some cases of the common cold (which is caused also by certain other viruses, predominantly rhinoviruses), while more lethal varieties can cause SARS, MERS, and COVID-19.This virus originated from Wuhan city of China.



1.   Here we will make a simple machine learning model to predict whether you have an coronavirus infection or not (or probability of having infection).
2.   The data that we will use here is not an official data, it has been created randomly.
3.   Because our data is not accurate here, it is not necessary to predict our model correctly.
4.   Here we are just trying to understand how machine learning can help us.





In [2]:
#importing required libraries
import pandas as pd
import numpy as np
import sklearn
from sklearn.metrics import mean_squared_error

In [3]:
#Reading csv file 

Data=pd.read_csv("/content/randomdata.csv")

Data.head()

Unnamed: 0,Fever,Age,BodyPain,DifficultyinBreath,RunnyNose,Travel,Cough,Probability
0,101,65,1,0,0,1,1,1
1,98,59,0,1,0,1,1,0
2,103,46,0,1,1,0,0,0
3,104,83,0,0,0,0,1,0
4,98,98,0,0,1,0,0,0


As you can see from the above output we have basic features of coronavirus infection (i.e.  fever, cold , age etc...) and our last column in data is a measure of all features (1, 0),where 1 means have an infection and 0 means no infection.

In [4]:
#Information of data
Data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1999 entries, 0 to 1998
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype
---  ------              --------------  -----
 0   Fever               1999 non-null   int64
 1   Age                 1999 non-null   int64
 2    BodyPain           1999 non-null   int64
 3   DifficultyinBreath  1999 non-null   int64
 4   RunnyNose           1999 non-null   int64
 5   Travel              1999 non-null   int64
 6   Cough               1999 non-null   int64
 7   Probability         1999 non-null   int64
dtypes: int64(8)
memory usage: 125.1 KB


We have to check the information of the data so that we can do any correction that is required in data (null values, column type etc...). So that we don't face any problem in further processing the data

In [5]:
#Defining our target (Y) and features (X)

X = Data.drop('Probability', axis = 1)

print(X.head())

print("data in Y")

Y=Data['Probability']

Y.head()

   Fever   Age   BodyPain  DifficultyinBreath  RunnyNose  Travel   Cough
0     101   65          1                   0          0        1      1
1      98   59          0                   1          0        1      1
2     103   46          0                   1          1        0      0
3     104   83          0                   0          0        0      1
4      98   98          0                   0          1        0      0
data in Y


0    1
1    0
2    0
3    0
4    0
Name: Probability, dtype: int64

In this section we have defined our target i.e. Y and features i.e. X. Basically,here our target is to find out the infection probability based on the features, so we have separated the column infection probability(Y) from other columns (X) (feature columns).

In [8]:
#Splitting train and test data
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.33, random_state = 5)

 In this section we have applied train_test_split function to split data into train and test data.(For training and testing purpose)

In [9]:
#Converting into numpy array

print(X_train.to_numpy())

Y_train.to_numpy()

X_test.to_numpy()

Y_test.to_numpy()

[[101  16   1 ...   0   0   1]
 [ 98  25   1 ...   0   0   0]
 [104  53   0 ...   1   1   0]
 ...
 [103  63   0 ...   0   1   1]
 [104  38   1 ...   1   1   1]
 [ 98 100   0 ...   1   1   1]]


array([0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
       1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0,
       1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1,
       0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1,
       1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1,
       1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0,

In [10]:
#Importing logistic regression model

from sklearn.linear_model import LogisticRegression

clf =LogisticRegression()

#training the model

Y_train_pred=clf.fit(X_train,Y_train)

In this section of code we have imported logistic regression machine learning model and train the model using fit function.

In [11]:
#Predicting using model 

#Infection (0,1) prediction 

infection=clf.predict([[98,20,0,1,0,0,0]])

#Infection probability prediction

infection_probability= clf.predict_proba([[98,20,0,0,0,0,1]])

print(infection)

print(infection_probability)

[0]
[[0.47974979 0.52025021]]


1.    In this part, we have predicted infection and (infection probability) with the model we have prepared. 
2.    As you can see from the output, we have two types of output. In the first output we have predicted directly (1 or 0), whereas in another we have calculated the probability of infection.