# Fuzzy Temperature Prediction System

## Setup: Import Libraries and Load Data

**Dataset:** SML2010 Smart Home Data  
**Goal:** Prepare environment for fuzzy inference system development

---


## Step 1: Import Required Libraries


In [1]:
# Standard data processing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ Libraries imported successfully")
print(f"  - NumPy version: {np.__version__}")
print(f"  - Pandas version: {pd.__version__}")

✓ Libraries imported successfully
  - NumPy version: 1.26.4
  - Pandas version: 2.1.2


## Step 2: Load Dataset

Load the SML2010 data files from the `data/` folder.


In [12]:
# Define data directory
data_dir = Path('data')

# List available data files
data_files = list(data_dir.glob('*.txt'))
print(f" Found {len(data_files)} data file(s):")
for file in data_files:
    print(f"  - {file.name}")

 Found 2 data file(s):
  - NEW-DATA-1.T15.txt
  - NEW-DATA-2.T15.txt


In [3]:
# Load the first data file
# The SML2010 dataset is space-separated without headers
data_file = data_dir / 'NEW-DATA-1.T15.txt'

# Read the data
df = pd.read_csv(data_file, sep='\s+', header=None)

print(f"✓ Data loaded successfully")
print(f"  - Shape: {df.shape}")
print(f"  - Rows: {df.shape[0]:,}")
print(f"  - Columns: {df.shape[1]}")

✓ Data loaded successfully
  - Shape: (2765, 25)
  - Rows: 2,765
  - Columns: 25


## Step 3: Initial Data Exploration

Quick look at the data structure and contents.


In [10]:
# Basic statistics
print("Data statistics:")
print(df.describe())

Data statistics:
                0      1       2        3     4        5      6        7   \
count         2765   2765    2765     2765  2765     2765   2765     2765   
unique          31     97    2352     2317   244     1662   1670     2405   
top     27/03/2012  23:30  21.304  19.2633    15  211.659  209.6  46.6747   
freq            96     29       5        5   519        7     13        4   

             8       9   ...    15    16        17    18    19    20       21  \
count      2765    2765  ...  2765  2765      2765  2765  2765  2765     2765   
unique     2404     786  ...  1212  1234      1460     2     2     2     2508   
top     49.5627  11.524  ...     0     0  -3.18533     0     0     0  17.2827   
freq          5     268  ...  1388  1399       138  2764  2764  2764        4   

            22    23              24  
count     2765  2765               1  
unique    2582    15               1  
top     60.432     2  24:Day_Of_Week  
freq         3   429               

In [11]:
# Check for missing values
missing = df.isnull().sum()
print("Missing values per column:")
print(missing[missing > 0] if missing.sum() > 0 else "No missing values found")

Missing values per column:
24    2764
dtype: int64
