<div style="color:white;
           display:fill;
           border-radius:25px;
           background-color:Black;
           font-size:210%;
           font-family:Verdana;
           letter-spacing:0.5px">
<p style="padding: 10px;
          color:white;
          text-align:center;"
          >
       WELCOME TO MY NOTEBOOK
</p>
</div>

# About Dataset: Lettuce Growth Days
This datset provides valuable information about how different environmental conditions impact the growth of lettuce plants, providing detailed insights over time. The dataset contains following features:
1. **Plant Identifier (Plant_ID):** A unique identifier assigned to each plant.
2. **Date (Date):** The date of Observation.
3. **Temperature (°C):** The recorded temperature expressed in degrees Celsius.
4. **Humidity (%):** The percentage representing the humidity level.
5. **Total Dissolved Solids (TDS) Value (ppm):** The measurement of Total Dissolved Solids given in parts per million.
6. **pH Level:** The measurement of the environmental pH level.
7. **Growth Days:**The duration in days from the initial growth stage of the plant to its full maturity.

![](https://i.gifer.com/7CH6.gif)


This notebook provide the detailed EDA.

Thank you for exploring my notebook. Please take a moment to upvote my notebook. Your support motivates me to keep improving and sharing valuable insights.😊

# Import Libraries

In [None]:
# import all the libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

# Read the Dataset

In [None]:
# Read the dataset
dataframe=pd.read_csv("/kaggle/input/lettuce-growth-days/lettuce_dataset.csv",encoding='latin-1')
dataframe.head(10)

In [None]:
dataframe.tail(10)

In [None]:
# shape of the dataset
dataframe.shape

In [None]:
# check the datatype of the dataset
dataframe.info()

In [None]:
# description of dataset in terms of statistics
dataframe.describe()

In [None]:
# To check Is there any null values in the dataset
dataframe.isna().sum()

> Here we notice that there is no null values in the dataset

In [None]:
# To check the duplicate values in the dataset
dataframe.duplicated().sum()

> There is no duplicate value in the dataset

In [None]:
dataframe['Date'] = pd.to_datetime(dataframe['Date'])

# Lets visualize the Correlation Matrix

In [None]:
plt.figure(figsize=(5,5))
sns.heatmap(dataframe.corr(), annot=True, cmap="Reds", fmt=".2f")
plt.show()

> 1. Here we notice that Growth Days and Date have a correltion equal to 1, whereas PH value has a correlation equal to 0 with Growth days.
2.Temperature and Humidity has a correlation with value of 0.03

# Exploratory Data Analysis

# Univariate Analysis
> Univariate Analysis is a type of analysis which consider only one variable at a time to see the data distribution.

In [None]:
# Data distribution of Temperature Column
sns.histplot(data=dataframe, x=dataframe["Temperature (°C)"], kde=True, color="blue")

In [None]:
# Data distribution of Humidity Column
sns.histplot(data=dataframe, x=dataframe["Humidity (%)"], kde=True, color="red")

In [None]:
# Data distribution of Total Dissolved Solids value
sns.histplot(data=dataframe, x=dataframe["TDS Value (ppm)"], kde=True, color="green")

In [None]:
# Data distribution of Growth Days
sns.histplot(data=dataframe, x=dataframe["Growth Days"], kde=True, color="brown")

In [None]:
# Data Distribution of PH value
sns.histplot(data=dataframe, x=dataframe["pH Level"], kde=True, color="purple")

In [None]:
# Distribution of Date column by Month Wise
sns.countplot(data=dataframe, x=dataframe["Date"].dt.month.map({8:"August", 9:"September"}),color="brown")

# Bivariate Analysis
> Bivariate Analysis is a type of analysis which consider two variable at a time to see the data distribution.

In [None]:
# Data Distribution of Growth Days vs Temperature
fig = px.scatter(dataframe, x="Temperature (°C)", y="Growth Days", marginal_x="histogram", marginal_y="histogram",title="Growth Days Vs Temperature", color_discrete_sequence=["red"])
fig.show()

> # Here we see that most of the lettuce grown in the temperature range of 18 to 25 Degree celsius, and we only see the small portion of datapoints lies in the temperature range from 29 to 34, which means that lettuce plant could not bear the high temperature.

In [None]:
# Data Distribution of Growth Days vs PH value
fig = px.box(dataframe, x="pH Level", y="Growth Days",title="Growth Days Vs pH Level", color_discrete_sequence=["blue"])
fig.show()

> # Here we can see that the differnt PH Level has almost no impact on the Growth of lettuce, the growth is almost same at different PH Level.

In [None]:
# Data Distribution of Growth Days vs Total Dissolved Solids value in parts per million
fig = px.scatter(dataframe, x="TDS Value (ppm)", y="Growth Days", marginal_x="histogram", marginal_y="histogram",title="Growth Days Vs TDS Value (ppm)", color_discrete_sequence=["green"])
fig.show()

In [None]:
# Data Distribution of Growth Days vs Humidity
fig = px.box(dataframe, x="Humidity (%)", y="Growth Days",title="Growth Days Vs Humidity (%)", color_discrete_sequence=["black"])
fig.show()

> # Here we analyse the little impact of the Humidity on Lettuce Growth days.

In [None]:
# Data Distribution of Growth Days vs Date
fig = px.scatter(dataframe, x="Date", y="Growth Days", marginal_x="histogram", marginal_y="histogram",title="Growth Days Vs Date", color_discrete_sequence=["purple"])
fig.show()

> # Here we can observe that as the date passes the lettuce growth growth rate increases, and both Date and Growth days shows the perfectly positive correlation.