This project will use an Arduino/Raspberry Pi with a DHT11 sensor to collect weather data (temperature and humidity) and then train a regression model to predict future patterns based on the collected data.

Part 1: Setting Up the IoT Device
Components Needed:
Raspberry Pi (recommended) or Arduino
DHT11 Sensor (or DHT22 for more accuracy)
Jumper Wires
Breadboard
Step 1: Connect the DHT11 Sensor
The DHT11 sensor has three pins:

VCC: Connect to 5V (3.3V on Raspberry Pi)
GND: Connect to Ground (GND)
DATA: Connect to a GPIO pin (e.g., GPIO4 on Raspberry Pi)

Step 2: Install Required Libraries on Raspberry Pi
If you're using a Raspberry Pi, start by enabling I2C and installing the required libraries:

In [None]:
sudo apt-get update
sudo apt-get install python3-pip
pip3 install Adafruit_DHT


Step 3: Write Code to Collect Data
Here’s a Python script that reads temperature and humidity data from the DHT11 sensor:

In [None]:
import Adafruit_DHT
import time
import csv

# Sensor type and GPIO pin
DHT_SENSOR = Adafruit_DHT.DHT11
DHT_PIN = 4  # Replace with the GPIO pin you used for DATA

# File to save data
filename = "weather_data.csv"

# Open the file in write mode
with open(filename, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Timestamp", "Temperature (C)", "Humidity (%)"])

    # Collect data for a set period (e.g., 1 hour, every 5 seconds)
    for i in range(720):  # Collect data 720 times (for 1 hour at 5-second intervals)
        humidity, temperature = Adafruit_DHT.read(DHT_SENSOR, DHT_PIN)
        
        if humidity is not None and temperature is not None:
            timestamp = time.strftime("%Y-%m-%d %H:%M:%S")
            print(f"Time: {timestamp} Temp: {temperature}C Humidity: {humidity}%")
            
            # Write data to CSV
            writer.writerow([timestamp, temperature, humidity])
        
        time.sleep(5)  # Wait for 5 seconds before next reading


This script will create a file named weather_data.csv and log temperature and humidity data every 5 seconds for an hour.

Part 2: Building the Prediction Model
Now that you have collected data, let's use Python and a basic ML library, scikit-learn, to train a regression model.

Step 1: Install Libraries

In [None]:
pip3 install pandas scikit-learn matplotlib


Step 2: Write Code to Train the Model
Here’s a Python script that reads the data from weather_data.csv, trains a linear regression model, and predicts future weather values.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Load the data
data = pd.read_csv("weather_data.csv")

# Data preprocessing (convert timestamps to sequential numbers)
data['Timestamp'] = range(len(data))  # Basic indexing for simplicity
X = data[['Timestamp']]
y_temp = data['Temperature (C)']
y_humidity = data['Humidity (%)']

# Train-test split
X_train, X_test, y_temp_train, y_temp_test = train_test_split(X, y_temp, test_size=0.2, random_state=0)
X_train, X_test, y_humidity_train, y_humidity_test = train_test_split(X, y_humidity, test_size=0.2, random_state=0)

# Train the model
temp_model = LinearRegression()
humidity_model = LinearRegression()
temp_model.fit(X_train, y_temp_train)
humidity_model.fit(X_train, y_humidity_train)

# Predict the test set
temp_predictions = temp_model.predict(X_test)
humidity_predictions = humidity_model.predict(X_test)

# Plot results
plt.figure(figsize=(10, 5))

# Temperature plot
plt.subplot(1, 2, 1)
plt.scatter(X_test, y_temp_test, color='blue', label='Actual')
plt.plot(X_test, temp_predictions, color='red', label='Predicted')
plt.xlabel("Time")
plt.ylabel("Temperature (C)")
plt.title("Temperature Prediction")
plt.legend()

# Humidity plot
plt.subplot(1, 2, 2)
plt.scatter(X_test, y_humidity_test, color='blue', label='Actual')
plt.plot(X_test, humidity_predictions, color='green', label='Predicted')
plt.xlabel("Time")
plt.ylabel("Humidity (%)")
plt.title("Humidity Prediction")
plt.legend()

plt.tight_layout()
plt.show()


Explanation of the ML Model Code
Data Preprocessing: The Timestamp column is transformed to a simple sequential number to make it easier to analyze time-based trends.
Train-Test Split: The dataset is divided into training and testing sets to evaluate model performance.
Model Training: Two linear regression models are trained, one for temperature prediction and another for humidity.
Visualization: The actual vs. predicted values for both temperature and humidity are plotted to visually assess the model's performance.

Part 3: Deployment and Real-Time Prediction
Once the model is trained, you could deploy it on your Raspberry Pi to make real-time predictions. For simplicity, you could load the trained model and run predictions on incoming data from the sensor