<h2><b>COVID-19 India Deaths</b></h2>
<h4><b>Author:</b> Data Science @ Georgia Tech</h4>
<p><b>Reference:</b> <a href="https://medium.com/coders-camp/225-machine-learning-projects-with-python-44d6ea8ace18">Medium</a></p>

<b>Welcome to the COVID-19 self-guided project!</b>

COVID-19 has been one of the most deadliest pandemics India has ever had. In this project, we are going to explore data concerning COVID-19 in India.

Here is the schema of the dataset:
<ul>
    <li><code>Date</code>: Contains the date of the record</li>
    <li><code>Date_YMD</code>: Contains date in Year-Month-Day Format</li>
    <li><code>Daily Confirmed</code>: Contains the daily confirmed cases of COVID-19</li>
    <li><code>Daily Deceased</code>: Contains the daily deaths due to COVID-19</li>
</ul>

The modules have been imported for you.

In [None]:
import pandas as pd
import numpy as np

Read the CSV file and return the first five rows from the dataset.


In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Reading the Dataset</b></font></summary>
  <pre>
    <code style="display: block;">
        # Solution code
        data = pd.read_csv("COVID19 data for overall INDIA.csv")
        print(data.head())
    </code>
  </pre>
</details>

Now we will start cleaning the data.

Find out how many values in each column are missing.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Finding Missing Values</b></font></summary>
  <pre>
    <code style="display: block;">
        # Solution code
        data.isnull().sum()
    </code>
  </pre>
</details>

Since there are no missing data values in the dataset, we don't have to worry about the missing values.

We do not need the "date" column in our analysis, so drop it from the dataset.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Drop "Date" Column</b></font></summary>
  <pre>
    <code style="display: block;">
        # Solution code
        data = data.drop("Date", axis=1)
    </code>
  </pre>
</details>

We are transitioning to the visualization part of our project to get a look at the trends of COVID-19 data in India.

Create a bar graph showing COVID cases in India.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Create Bar Graph</b></font></summary>
  <pre>
    <code style="display: block;">
        # Solution code
        import plotly.express as px
        fig = px.bar(data, x='Date_YMD', y='Daily Confirmed')
        fig.show()
    </code>
  </pre>
</details>

As we can see, most of the cases happened between April and May 2021.

Now let's visualize the daily cases and deaths due to COVID-19.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Cases + Deaths Visualization</b></font></summary>
  <pre>
    <code style="display: block;">
        # Solution Code
        cases = data["Daily Confirmed"].sum()
        deceased = data["Daily Deceased"].sum()
        labels = ["Confirmed", "Deceased"]
        values = [cases, deceased]
        fig = px.pie(data, values=values,
                names=labels,
                title='Daily Confirmed Cases vs Daily Deaths', hole=0.5)
        fig.show()
    </code>
  </pre>
</details>

Using the given data, calculate and print out the death rate of COVID-19.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Death Rate Calculation</b></font></summary>
  <pre>
    <code style="display: block;">
        # Solution Code
        death_rate = (data["Daily Deceased"].sum() / data["Daily Confirmed"].sum()) * 100
        print(death_rate)
    </code>
  </pre>
</details>

Now, let's have a look at the daily death rates of COVID-19.

We can do this by using a bar chart.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Death Rate Bar Chart</b></font></summary>
  <pre>
    <code style="display: block;">
        # Solution Code
        import plotly.express as px
        fig = px.bar(data, x='Date_YMD', y='Daily Deceased')
        fig.show()
    </code>
  </pre>
</details>

We will be using AutoTS which is a Automatic Machine Learning Model for Time Series Data. Time series Data is when the data comes in real time to the database.

If you do not have AutoTS installed, you can use this command statement in your notebook: <code>!pip install autots</code>.

Predict COVID-19 deaths by creating an AutoTS model and print out the forecast.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>AutoTS Model</b></font></summary>
  <pre>
    <code style="display: block;">
        # Solution Code
        from autots import AutoTS
        model = AutoTS(forecast_length=30, frequency='infer', ensemble='simple')
        model = model.fit(data, date_col="Date_YMD", value_col='Daily Deceased', id_col=None)
        prediction = model.predict()
        forecast = prediction.forecast
        print(forecast)
    </code>
  </pre>
</details>

## **Summary**

**Congratulations on completing the COVID-19 project!**

We hope you have learned about how COVID-19 impacted India and the AutoTS model that we used.