<a href="https://colab.research.google.com/github/SushantVij/ML_WeatherPrediction/blob/main/seattle_weather_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

WEATHER PREDICTION

Install and Import Dependencies

In [None]:
!pip install stats

Read in Data and Process Data

Your code seems to be an initial setup for the data analysis, preprocessing, and modeling steps of your project. However, I noticed a couple of things that might need attention:

1. **Imports and Duplicate Imports:** It looks like you've imported some modules more than once. For instance, you've imported `scipy` twice and imported specific functions from `scipy.stats` twice as well. It's a good practice to consolidate your imports to avoid confusion and to keep your code clean.

2. **Unused Imports:** You've imported the `re` module, but I don't see it being used in the code snippet you've provided. If you're not using it, you can remove the import to keep your code tidy.

3. **Missing Import:** You've mentioned `stats`, but it seems to be missing an import statement. If you intend to use this module, make sure you've imported it properly.

4. **Import Order:** Typically, libraries from the Python Standard Library come before third-party libraries. You might want to organize your imports like this to improve readability.

5. **Commented Imports:** You've commented out the import of various metrics from `sklearn.metrics`. Make sure to uncomment these imports when you start using these metrics later in your project.

6. **Documentation and Commenting:** While this is a small snippet and might be part of a larger script, it's good practice to include comments or docstrings to explain what each section of code is doing. This makes it easier for you and others to understand your code's purpose.

7. **Data Loading:** I see that you haven't included the part of the code where you load your weather dataset. Make sure you load your data using `pd.read_csv()` or any appropriate method.

8. **Data Preprocessing:** Your code seems to be missing the actual preprocessing steps like dealing with missing values, outlier removal, and transformation. Ensure you've included these steps in your code.

9. **Model Training:** You've imported various classifiers, but I don't see the actual code for splitting the data, training the models, and evaluating them. Make sure you have sections dedicated to these steps.

10. **Visualization:** You've imported `matplotlib.pyplot` and `seaborn` but haven't included any code for creating visualizations. Ensure you have appropriate code blocks for generating the visualizations you mentioned in your project explanation.

Remember that this is just a snippet, and your actual code will likely be more detailed and comprehensive. Make sure to structure your code logically, include appropriate comments and documentation, and follow best practices for readability and maintainability.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
import re
import missingno as mso
import stats
from scipy import stats
import pandas as pd
from scipy.stats import ttest_ind
from scipy.stats import pearsonr
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
#from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
from google.colab import files
from IPython.display import Image

In [None]:
uploaded=files.upload()

In [None]:
data=pd.read_csv("/content/seattle-weather.csv")
data.head()

In [None]:
data.shape

In [None]:
import warnings
warnings.filterwarnings('ignore')  # Function to ignore all warnings by setting 'ignore' as a parameter

# Use sns.countplot() to visualize the weather data
sns.countplot(x="weather", data=data, palette='hls')
plt.show()  # Don't forget to show the plot using plt.show()


In [None]:
countrain=len(data[data.weather=='rain'])
print(countrain)
percentage=(countrain/len(data.weather))*100
print(percentage)

In [None]:
countsun=len(data[data.weather=='sun'])
print(countsun)
percentage=(countsun/len(data.weather))*100
print(percentage)

In [None]:
countdrizzle=len(data[data.weather=='drizzle'])
print(countdrizzle)
percentage=(countdrizzle/len(data.weather))*100
print(percentage)

In [None]:
countsnow=len(data[data.weather=='snow'])
print(countsnow)
percentage=(countsnow/len(data.weather))*100
print(percentage)

In [None]:
countfog=len(data[data.weather=='fog'])
print(countfog)
percentage=(countfog/len(data.weather))*100
print(percentage)

It looks like you're calculating and printing the counts and percentages of different weather conditions in your dataset. The code is fine for achieving this task, but it can be improved for readability and maintainability. Instead of repeating the same code for each weather condition, you can create a loop to iterate through all conditions. Here's how you can do it:

```python
weather_conditions = ['rain', 'sun', 'drizzle', 'snow', 'fog']

for condition in weather_conditions:
    count = len(data[data.weather == condition])
    percentage = (count / len(data.weather)) * 100
    print(f"Count of {condition}: {count}")
    print(f"Percentage of {condition}: {percentage:.2f}%")
    print()
```

This way, you'll avoid code duplication and make it easier to maintain and update if you ever add more weather conditions to your dataset.

Additionally, using f-strings (`f"Count of {condition}: {count}"`) for printing provides a more concise and readable way to format your output. The `:.2f` in `{percentage:.2f}%` ensures that the percentage is displayed with two decimal places.

In [None]:
data[['precipitation','temp_max','temp_min','wind']].describe()
#view some basic statistical details like percentile, mean, std etc. of a data frame or a series of numeric values.

Your code seems to be setting up a 2x2 grid of histograms using Seaborn for visualizing the distribution of different weather attributes. This is a good approach to quickly understand the data distribution. However, there are a couple of points worth mentioning:

Commenting and Explanation: Your comments provide a good overview of what the code is doing, but you can consider adding more detailed explanations about what each attribute represents and why you're visualizing it. This can make your code more informative for someone who's reading it for the first time.

Color Choices: Your color choices ('green', 'red', 'blue', 'orange') are reasonable for differentiation, but you might want to consider using colors that are more distinct to improve clarity, especially for viewers who might be colorblind. Seaborn offers a wide range of color palettes that you can experiment with.

Labeling and Titles: Adding labels to the x and y axes of each subplot and giving your entire figure a title can provide more context to the reader. You can use functions like ax.set_xlabel(), ax.set_ylabel(), and plt.suptitle() to achieve this.

Legend or Title for KDE: Since you're using the kde parameter in sns.histplot() to show kernel density estimates, you might want to include a legend or a title to indicate that the colored line represents the kernel density estimate.

Adjusting Layout: Depending on your data and the content of the histograms, you might need to adjust the layout to prevent overlaps or excessive white space.

Color Palette: To use distinct colors, you can consider using a color palette from Seaborn. For example, you can use sns.color_palette("Set1") to get a set of distinct colors that are visually appealing.

In [None]:
sns.set(style='darkgrid')#seems to be changing the background color of the plot.
fig,axs=plt.subplots(2,2,figsize=(10,8))
#kde:bool(kerner density esitmate)polt is am ethod for visulization the distribuition of dataset,analogous to histogram
#If True, compute a kernel density estimate to smooth the distribution and show on the plot as (one or more) line(s).
#Only relevant with univariate data(simplest form of data for statistical analysis).
sns.histplot(data=data,x='precipitation',kde=True,ax=axs[0,0],color='green')#+skewed
sns.histplot(data=data,x='temp_max',kde=True,ax=axs[0,1],color='red')#-skewed+outliers
sns.histplot(data=data,x='temp_min',kde=True,ax=axs[1,0],color='blue')#-skewed+outliers
sns.histplot(data=data,x='wind',kde=True,ax=axs[1,1],color='orange')#+skewed

The code you've provided calculates and prints the skewness and kurtosis of the 'precipitation' attribute in your dataset. These statistics provide insights into the shape and distribution of your data. Skewness measures the asymmetry of the data distribution, while kurtosis measures the heaviness of the tails relative to a normal distribution.

Your code is concise and straightforward for calculating and displaying these statistics. However, to provide more context and clarity, you could consider adding explanations about what skewness and kurtosis values mean in terms of the data distribution. This can help readers who might not be familiar with these concepts understand their significance.


Sure, I'd be happy to explain skewness and kurtosis in the context of statistics and data distribution:

Skewness:
Skewness is a statistical measure that indicates the asymmetry of a probability distribution. In simpler terms, it helps us understand how lopsided or skewed the data distribution is. Skewness is calculated based on the third standardized moment of the data. It can take three different values:

Positive Skewness: If the distribution is positively skewed, it means that the tail on the right side of the distribution is longer or stretched out compared to the left side. In other words, the majority of the data is concentrated on the left side of the distribution, and there are a few extreme values on the right side.

Negative Skewness: If the distribution is negatively skewed, it means that the tail on the left side of the distribution is longer or stretched out compared to the right side. The majority of the data is concentrated on the right side, and there are a few extreme values on the left side.

Zero Skewness: If the distribution has zero skewness, it indicates that the distribution is symmetric, meaning that the left and right sides are relatively balanced.

Kurtosis:
Kurtosis is a statistical measure that describes the shape of the distribution's tails and the height of its peak in comparison to a normal distribution. It provides information about the extent to which data in the tails of a distribution differ from the tails of a normal distribution. There are generally three types of kurtosis:

Leptokurtic: A distribution with positive kurtosis has heavier tails and a sharper peak than a normal distribution. This means that the data has more extreme values (outliers) and is more concentrated around the mean.

Platykurtic: A distribution with negative kurtosis has lighter tails and a flatter peak than a normal distribution. This indicates that the data has fewer extreme values and is more dispersed.

Mesokurtic: A distribution with zero kurtosis has tails and a peak similar to a normal distribution. The amount of data in the tails is roughly the same as in a normal distribution.

In summary, skewness and kurtosis provide valuable insights into the shape, symmetry, and tail behavior of data distributions. Analyzing these measures helps us understand the underlying characteristics of our data, which can be crucial for making informed decisions during data analysis and modeling.

In [None]:
print("skewness: {:.3f}".format(data['precipitation'].skew()))
print("kurtosis: {:.3f}".format(data['precipitation'].kurt()))

In [None]:
print("skewness: {:.3f}".format(data['temp_max'].skew()))
print("kurtosis: {:.3f}".format(data['temp_max'].kurt()))

In [None]:
print("skewness: {:.3f}".format(data['temp_min'].skew()))
print("kurtosis: {:.3f}".format(data['temp_min'].kurt()))

In [None]:
print("skewness: {:.3f}".format(data['wind'].skew()))
print("kurtosis: {:.3f}".format(data['wind'].kurt()))

In [None]:
#to understand skewness of data
# A violin plot is a cross between a box plot and a kernel density plot that displays data peaks.
#It’s used to show how numerical data is distributed.
sns.set(style='darkgrid')
fig,axs=plt.subplots(2,2,figsize=(10,8))
sns.violinplot(data=data,x='precipitation',kde=True,ax=axs[0,0],color='green')
sns.violinplot(data=data,x='temp_max',kde=True,ax=axs[0,1],color='red')
sns.violinplot(data=data,x='temp_min',kde=True,ax=axs[1,0],color='blue')
sns.violinplot(data=data,x='wind',kde=True,ax=axs[1,1],color='orange')

In [None]:
# Boxplot is also used to detect outliers in the data set. It captures the summary of the data efficiently with a simple box and whiskers
# and allows us to compare easily across groups
plt.figure(figsize=(12, 6))
sns.boxplot(x='weather', y='precipitation', data=data, palette='hls')
plt.show()  # Don't forget to show the plot using plt.show()


In [None]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='weather', y='temp_max', data=data, palette='inferno')
plt.show()


In [None]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='weather', y='wind', data=data, palette='hls')
plt.show()


In [None]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='weather', y='temp_min', data=data, palette='inferno')
plt.show()


In [None]:
#Heatmap is a visualization that displays data in a color encoded matrix.
#The intensity of color varies based on the value of the attribute represented in the visualization.
#+ve corelation b/w min and max temp
#+ve corelation b/w precipitation b/w wind
plt.figure(figsize=(12,6))
sns.heatmap(data.corr(),annot=True,cmap='coolwarm')

Preprocessing and Cleaning

In [None]:
data.isna().sum()

In [None]:
plt.figure(figsize=(12,6))
axz=plt.subplot(1,2,2)
#function takes three arguments that describes the layout of the figure.
#The layout is organized in rows and columns, which are represented by the first and second argument.
#The third argument represents the index of the current plot.
#the figure has 1 row, 2 columns, and this plot is the second plot.
#Missingno library offers a very nice way to visualize the distribution of NaN values.
# Missingno is a Python library and compatible with Pandas.
#This bar chart gives you an idea about how many missing values are there in each column.
mso.bar(data.drop(['date'],axis=1),ax=axz,fontsize=12)

In [None]:
data.dtypes

In [None]:
data=data.drop(['date'],axis=1)

Train Model

In [None]:
#dataframe.quantile() function return values at the given quantile over requested axis, a numpy.percentile.
Q1=data.quantile(0.25)
Q3=data.quantile(0.75)
IQR=Q3-Q1
data=data[~((data<(Q1-1.5*IQR))|(data>(Q3+1.5*IQR))).any(axis=1)]


In [None]:
import numpy as np
#function is used to determine the positive square-root of an array, element-wise.
data.precipitation=np.sqrt(data.precipitation)
data.wind=np.sqrt(data.wind)

In [None]:
sns.set(style='darkgrid')
fig,axs=plt.subplots(2,2,figsize=(10,8))
sns.histplot(data=data,x='precipitation',kde=True,ax=axs[0,0],color='green')
sns.histplot(data=data,x='temp_max',kde=True,ax=axs[0,1],color='red')
sns.histplot(data=data,x='temp_min',kde=True,ax=axs[1,0],color='blue')
sns.histplot(data=data,x='wind',kde=True,ax=axs[1,1],color='orange')

In [None]:
data.head()

It looks like you're using the LabelEncoder from scikit-learn to transform the 'weather' column in your dataset into numerical labels. This is a common preprocessing step when dealing with categorical data in machine learning algorithms that require numerical input. However, I noticed a slight misunderstanding in your comment regarding fit_transform(). Let me clarify:

Label Encoding Purpose: The purpose of using LabelEncoder is to convert categorical data (like weather conditions) into numerical labels. This is done to represent categorical data in a format that machine learning algorithms can understand, as many algorithms work with numerical inputs.

fit_transform(): The fit_transform() method combines two steps: fitting the encoder on the data and then transforming the data into encoded labels. In the context of LabelEncoder, the "fitting" process computes the mapping between unique categories and integer labels, and the "transforming" process applies this mapping to the data to encode it.

In [None]:
lc=LabelEncoder()
data['weather']=lc.fit_transform(data['weather'])# a fit_transform() to the training data. The model we created, in this case,
#will discover the mean and variance of the characteristics in the training set.

It appears that you're preparing your data for model training and splitting it into feature (x) and target (y) variables. However, there are a few points to consider and clarify:

Data Preparation: The line x=((data.loc[:,data.columns!='weather']).astype(int)).values[:,0:] is trying to convert all columns except 'weather' to integer values and assign them to x. However, if your dataset contains non-integer columns (e.g., floats, strings), converting them directly to integers might lead to unexpected results. You should ensure that only the appropriate columns are converted to integers.

Data Splitting: It seems like you're attempting to split the data into features (x) and target (y) variables, which is a necessary step for model training. However, the code you've provided doesn't include the splitting itself. Make sure you use train_test_split from scikit-learn to divide your data into training and testing sets before feeding them into the models.

In [None]:
x=((data.loc[:,data.columns!='weather']).astype(int)).values[:,0:]
y=data['weather'].values


It looks like you've used the `.unique()` method to obtain the unique values of the 'weather' column after label encoding. The resulting array `[0, 2, 4, 3, 1]` indicates that your original weather conditions have been encoded into these numerical labels. Here's how to interpret these encoded labels:

- `0`: This corresponds to one of the weather conditions (e.g., 'rain').
- `2`: This corresponds to another weather condition (e.g., 'drizzle').
- `4`: This corresponds to a different weather condition (e.g., 'fog').
- `3`: This corresponds to another weather condition (e.g., 'snow').
- `1`: This corresponds to another weather condition (e.g., 'sun').

These numerical labels will be used as the target variable (`y`) when training your machine learning models. The models will then learn to predict the encoded labels based on the input features (`x`). Remember that during evaluation or when making predictions, you might need to reverse this encoding to interpret the results in terms of the original weather conditions.

In [None]:
data.weather.unique()

ACCURACY

In [None]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.1,random_state=2)

In [None]:

print('There are {} samples in the training set and {} samples in the test set'.format(
x_train.shape[0], x_test.shape[0]))
print()
#For many machine learning algorithms, it is important to scale the data. Let's do that now using sklearn

It looks like you're using the StandardScaler from scikit-learn to standardize your feature data (X) before feeding it into your machine learning models. Standardization is a common preprocessing step that helps ensure that all features have similar scales, which can improve the performance of many machine learning algorithms.

Your code is on the right track, but there are a few points to consider:

Standardization: Standardization involves subtracting the mean and dividing by the standard deviation for each feature. This centers the feature distribution around zero with a standard deviation of 1.

Applying Transformation: You're correctly fitting the StandardScaler on the training data and then transforming both the training and testing data using the same scaler. This ensures that the same transformation is applied to both sets.

By performing standardization, you ensure that the features in your dataset have similar scales, which can prevent certain features from dominating the model's learning process. This preprocessing step is especially important for algorithms that are sensitive to the scale of the input features, such as k-nearest neighbors, support vector machines, and gradient-based algorithms.

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(x_train)
X_train_std = sc.transform(x_train)
X_test_std = sc.transform(x_test)




It looks like you've successfully trained a k-nearest neighbors (KNN) classifier and are evaluating its performance on both the training and testing data. The accuracy scores you've printed indicate how well the model is performing. However, there's a slight issue with the accuracy calculation and interpretation. Let me explain:

Accuracy Calculation: The accuracy is calculated as the ratio of correctly predicted instances to the total number of instances. The formula is: (Number of Correct Predictions) / (Total Number of Predictions).

Interpretation of Accuracy: The accuracy score is usually expressed as a value between 0 and 1. However, in your code, you're getting accuracy values like 0.07 and 0.05, which are below 1 and might indicate a problem.

In [None]:
knn=KNeighborsClassifier()
knn.fit(x_train,y_train)
print('KNN Aaccuracy:{:.2f}%'.format(knn.score(x_test,y_test)*100))
print('The accuracy of the knn classifier is {:.2f} out of 1 on training data'.format(knn.score(X_train_std, y_train)))
print('The accuracy of the knn classifier is {:.2f} out of 1 on test data'.format(knn.score(X_test_std, y_test)))

In [None]:
knn = KNeighborsClassifier()
knn.fit(X_train_std, y_train)  # Use standardized data for training
print('KNN Aaccuracy:{:.2f}%'.format(knn.score(x_test,y_test)*100))
# Calculate and print accuracy on the test data
accuracy_test = knn.score(X_test_std, y_test)
print('KNN Accuracy on Test Data: {:.2f}%'.format(accuracy_test * 100))

# Calculate and print accuracy on the training data
accuracy_train = knn.score(X_train_std, y_train)
print('KNN Accuracy on Training Data: {:.2f}%'.format(accuracy_train * 100))


It looks like you've trained a Support Vector Machine (SVM) classifier and are evaluating its performance using accuracy scores. The code is well-structured and follows the same format as your previous KNN classifier evaluation. However, there's one important point to consider:

SVM accuracy scores of 0.59 and 0.63 are better than 0.07 and 0.05, but they still seem relatively low. The accuracy scores for both training and testing should be relatively close. A significant difference between training and testing accuracy could indicate overfitting.

If there is a substantial gap between training and testing accuracy, it's worth considering techniques like hyperparameter tuning (using techniques like GridSearchCV or RandomizedSearchCV) to improve the model's generalization performance. Additionally, you might want to explore other evaluation metrics such as precision, recall, F1-score, and confusion matrices to get a more comprehensive view of the classifier's performance.

In [None]:
svm=SVC()
svm.fit(x_train,y_train)
print('SVM Aaccuracy:{:.2f}%'.format(svm.score(x_test,y_test)*100))
print('The accuracy of the svm classifier on training data is {:.2f} out of 1'.format(svm.score(X_train_std, y_train)))
print('The accuracy of the svm classifier on test data is {:.2f} out of 1'.format(svm.score(X_test_std, y_test)))

In [None]:
svm = SVC()
svm.fit(X_train_std, y_train)  # Use standardized data for training

# Calculate and print accuracy on the test data
accuracy_test = svm.score(X_test_std, y_test)
print('SVM Accuracy on Test Data: {:.2f}%'.format(accuracy_test * 100))

# Calculate and print accuracy on the training data
accuracy_train = svm.score(X_train_std, y_train)
print('SVM Accuracy on Training Data: {:.2f}%'.format(accuracy_train * 100))


It looks like you've successfully trained a Gradient Boosting Classifier (GBC) and calculated its accuracy on the test data. The accuracy score of `78.23%` indicates how well the model is performing on the test set. Gradient Boosting is a powerful ensemble method that often provides competitive performance in many scenarios.

Your code seems well-structured, and the accuracy calculation is straightforward. However, remember that accuracy is just one metric for evaluating a classifier's performance. Depending on the problem and the nature of the dataset, you might want to consider using additional evaluation metrics such as precision, recall, F1-score, and confusion matrices.

If you're interested in further improving your model's performance, consider experimenting with hyperparameter tuning using techniques like GridSearchCV or RandomizedSearchCV. These methods can help you find the best combination of hyperparameters for your Gradient Boosting Classifier, potentially leading to even better accuracy and generalization.

Overall, you've done a great job in training and evaluating your classifiers. Keep up the good work!

In [None]:
gbc=GradientBoostingClassifier(subsample=0.5,n_estimators=450,max_depth=5,max_leaf_nodes=25)
gbc.fit(x_train,y_train)
print('GBC Aaccuracy:{:.2f}%'.format(gbc.score(x_test,y_test)*100))


It looks like you've successfully trained an XGBoost Classifier and are evaluating its performance. XGBoost is a popular ensemble learning algorithm that often yields strong results.

A couple of points to note:

Accuracy Scores: Your XGBoost model seems to be performing well with an accuracy score of 83.06% on the test data. However, similar to your previous models, the training and testing accuracy scores (0.11 and 0.06) seem unusual. These scores should typically be between 0 and 1, indicating a percentage.

Duplicated Names: You've named both the imported XGBoost library and the instance of XGBoostClassifier as xgb. This can lead to confusion and unexpected behavior. It's a good practice to use distinct names for your variables to avoid potential conflicts.

In [None]:
import xgboost as xgb
import warnings
warnings.filterwarnings('ignore')
xgb=XGBClassifier()

xgb_clf = xgb.fit(X_train_std, y_train)
xgb.fit(x_train,y_train)
print('XGB Aaccuracy:{:.2f}%'.format(xgb.score(x_test,y_test)*100))
print('The accuracy of the xgb classifier is {:.2f} out of 1 on training data'.format(xgb_clf.score(X_train_std, y_train)))
print('The accuracy of the xgb classifier is {:.2f} out of 1 on test data'.format(xgb_clf.score(X_test_std, y_test)))

In [None]:
import xgboost as xgb
import warnings
warnings.filterwarnings('ignore')

# Create an instance of XGBoostClassifier
xgb_clf = xgb.XGBClassifier()

# Train the XGBoost model
xgb_clf.fit(X_train_std, y_train)  # Use standardized data for trainin#
#print('XGB Aaccuracy:{:.2f}%'.format(xgb.score(x_test,y_test)*100))
# Calculate and print accuracy on the test data
accuracy_test = xgb_clf.score(X_test_std, y_test)
print('XGB Accuracy on Test Data: {:.2f}%'.format(accuracy_test * 100))

# Calculate and print accuracy on the training data
accuracy_train = xgb_clf.score(X_train_std, y_train)
print('XGB Accuracy on Training Data: {:.2f}%'.format(accuracy_train * 100))


Forecast Away

Your code snippet seems to be using the trained XGBoost classifier to predict the weather condition based on a set of input attributes. The code structure is correct, but there's one small issue with the variable name you've used: input.

input is a built-in Python function that reads a line from the user input. When you use input as a variable name, it can lead to confusion and unexpected behavior. To avoid this, consider using a different variable name, such as input_data or something more descriptive.

In [None]:
import numpy as np

input_data = np.array([[0, 25, 19, 4.56774]])
predicted_label = xgb_clf.predict(input_data)

print('The weather is: ')
if predicted_label == 0:
    print('Drizzle')
elif predicted_label == 1:
    print('Fog')
elif predicted_label == 2:
    print('Rain')
elif predicted_label == 3:
    print('Snow')
else:
    print('Sun')
