# <center> Project Name - Solar Radiation Prediction </center>


# <center> Solario </center> 
Company Introduction
Your client for this project is a major producer of solar energy.

Recently the company has taken an initiative to make waste-free power source which could be a potential leap towards protecting the environment and climate change.
They have partnered with NASA to get a hold on the meteorological data so that they could build an efficient system to measure solar radiation.
They have been working on an app which gives updates on solar radiation throughout different timestamps during the day.

Current Scenario
Solar radiation is often defined as the energy reaching the earth from the sun.
People have been measuring the energy coming from the sun for centuries, and today more people are measuring solar radiation than ever before. But we are still in need of an efficient and accurate way of measuring solar radiation.




# Problem Statement 
The current process suffers from the following problems:

The solar radiation is dependent on Humidity and Temperature.
Measuring solar radiation using different devices requires a lot of manpower and resources.
It takes a lot of time fixing the sensors and interpreting the readings from the different devices which becomes inefficient as time goes by.

The company has hired you as data science consultants. They want to automate the process of predicting the compressive strength of the concrete, based on the materials used.

Your Role
You are given a dataset containing measurements for the past 4 months.
Your task is to build a regression model using the dataset.
Because there was no machine learning model for this problem in the company, you don’t have a quantifiable win condition. You need to build the best possible model.

Project Deliverables<br>
Deliverable: Solar Radiation Prediction.<br>
Machine Learning Task: Regression<br>
Target Variable: Radiation<br>
Win Condition: N/A (best possible model)

Evaluation Metric <br>
The model evaluation will be based on the RMSE score. <br>


# Data Description

<table>	<th>	Column Name	</th>	<th>	Description	</th>	
<tr>	<td>	Id	</td>	<td>	Unique identity of each observation.	</td>	</tr>
<tr>	<td>	UNIXTime	</td>	<td>	Track time as a running total of seconds.	</td>	</tr>
<tr>	<td>	Data	</td>	<td>	Day and time at which the reading was started in hh:mm: ss 24-hour format.	</td>	</tr>
<tr>	<td>	Time	</td>	<td>	Time at which the reading was taken in 24-hour format.	</td>	</tr>
<tr>	<td>	Temperature	</td>	<td>	Temperature during the daytime in degrees Fahrenheit(°F).	</td>	</tr>
<tr>	<td>	Pressure	</td>	<td>	Barometric pressure(Hg).	</td>	</tr>
<tr>	<td>	Humidity	</td>	<td>	Humidity per cent during the daytime(%).	</td>	</tr>
<tr>	<td>	WindDirection(Degrees)	</td>	<td>	Direction of wind in degrees (°).	</td>	</tr>
<tr>	<td>	Speed	</td>	<td>	Speed of wind in miles per hour (mph).	</td>	</tr>
<tr>	<td>	TimeSunRise	</td>	<td>	Time of sunrise in the morning(Hawaii time).	</td>	</tr>
<tr>	<td>	TimeSunSet	</td>	<td>	Time of sunset in the evening(Hawaii time).	</td>	</tr>
<tr>	<td>	Radiation	</td>	<td>	Solar radiation measured in watts per meter^2 (W/m2).	</td>	</tr>
![image.png](attachment:image.png)

In [1]:
## Importin Necessary Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

from pandas_profiling import ProfileReport
import dtale

import warnings
warnings.filterwarnings('ignore')

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split,cross_val_score,GridSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.feature_selection import SelectFromModel

from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor,AdaBoostRegressor,GradientBoostingRegressor
from xgboost import XGBRegressor




The dash_core_components package is deprecated. Please replace
`import dash_core_components as dcc` with `from dash import dcc`
  import dash_core_components as dcc
The dash_html_components package is deprecated. Please replace
`import dash_html_components as html` with `from dash import html`
  import dash_html_components as html


In [2]:
train_data = pd.read_csv('solar_train.csv')
train_data.shape

(26148, 12)

In [3]:
print(train_data.head())
train_data.tail()

      Id    UNIXTime                    Data      Time  Temperature  Pressure  \
0   4152  1473879005   9/14/2016 12:00:00 AM  08:50:05           57     30.45   
1  13047  1476293121  10/12/2016 12:00:00 AM  07:25:21           50     30.47   
2   7420  1477993220  10/31/2016 12:00:00 AM  23:40:20           47     30.48   
3   6508  1473013505    9/4/2016 12:00:00 AM  08:25:05           57     30.47   
4  29110  1481885434  12/16/2016 12:00:00 AM  00:50:34           41     30.23   

   Humidity  WindDirection(Degrees)  Speed TimeSunRise TimeSunSet  Radiation  
0        68                   26.70   4.50    06:10:00   18:26:00     680.04  
1        96                  144.96  10.12    06:16:00   18:02:00     277.37  
2        56                  119.52   3.37    06:23:00   17:49:00       1.29  
3        93                   38.61   2.25    06:08:00   18:35:00     544.75  
4       103                  177.55   2.25    06:50:00   17:46:00       1.22  


Unnamed: 0,Id,UNIXTime,Data,Time,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,TimeSunRise,TimeSunSet,Radiation
26143,29802,1481676309,12/13/2016 12:00:00 AM,14:45:09,50,30.28,96,304.22,12.37,06:48:00,17:45:00,216.29
26144,5390,1473426025,9/9/2016 12:00:00 AM,03:00:25,44,30.37,100,162.8,3.37,06:09:00,18:31:00,1.47
26145,860,1474966519,9/26/2016 12:00:00 AM,22:55:19,48,30.42,64,158.9,4.5,06:12:00,18:15:00,1.2
26146,15795,1475451021,10/2/2016 12:00:00 AM,13:30:21,56,30.42,99,55.72,13.5,06:14:00,18:10:00,659.12
26147,23654,1478255721,11/4/2016 12:00:00 AM,00:35:21,45,30.45,56,136.11,5.62,06:25:00,17:47:00,1.18


In [4]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26148 entries, 0 to 26147
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Id                      26148 non-null  int64  
 1   UNIXTime                26148 non-null  int64  
 2   Data                    26148 non-null  object 
 3   Time                    26148 non-null  object 
 4   Temperature             26148 non-null  int64  
 5   Pressure                26148 non-null  float64
 6   Humidity                26148 non-null  int64  
 7   WindDirection(Degrees)  26148 non-null  float64
 8   Speed                   26148 non-null  float64
 9   TimeSunRise             26148 non-null  object 
 10  TimeSunSet              26148 non-null  object 
 11  Radiation               26148 non-null  float64
dtypes: float64(4), int64(4), object(4)
memory usage: 2.4+ MB


In [5]:
test_data = pd.read_csv('solar_test.csv')
test_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6538 entries, 0 to 6537
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Id                      6538 non-null   int64  
 1   UNIXTime                6538 non-null   int64  
 2   Data                    6538 non-null   object 
 3   Time                    6538 non-null   object 
 4   Temperature             6538 non-null   int64  
 5   Pressure                6538 non-null   float64
 6   Humidity                6538 non-null   int64  
 7   WindDirection(Degrees)  6538 non-null   float64
 8   Speed                   6538 non-null   float64
 9   TimeSunRise             6538 non-null   object 
 10  TimeSunSet              6538 non-null   object 
dtypes: float64(3), int64(4), object(4)
memory usage: 562.0+ KB


In [6]:
train_data1 = train_data.copy()

In [7]:
train_data1['Data'] = pd.to_datetime(train_data1['Data'])

In [8]:
train_data1['Time'] = pd.to_datetime(train_data1['Time'])

In [9]:
train_data1['Time'] = train_data1['Time'].dt.hour

In [10]:
train_data1['Month'] = train_data1['Data'].dt.month.astype(str)

In [11]:
train_data1['Time'].unique()

array([ 8,  7, 23,  0,  2, 22,  9, 11,  1, 17, 13,  5,  6, 18, 15, 20, 12,
       16, 21,  4,  3, 19, 14, 10], dtype=int64)

In [12]:
def daypart(hour):
    if hour in [2,3,4,5]:
        return "dawn"
    elif hour in [6,7,8,9]:
        return "morning"
    elif hour in [10,11,12,13]:
        return "noon"
    elif hour in [14,15,16,17]:
        return "afternoon"
    elif hour in [18,19,20,21]:
        return "evening"
    else: return "midnight"

In [13]:
train_data1['Time2'] = train_data1['Time'].apply(daypart)

In [14]:
train_data1['Time2'].value_counts()

dawn         4414
morning      4376
evening      4373
afternoon    4343
noon         4335
midnight     4307
Name: Time2, dtype: int64

In [15]:
train_data1['Month'].value_counts()

10    6995
11    6657
12    6527
9     5969
Name: Month, dtype: int64

In [16]:
train_data1.columns

Index(['Id', 'UNIXTime', 'Data', 'Time', 'Temperature', 'Pressure', 'Humidity',
       'WindDirection(Degrees)', 'Speed', 'TimeSunRise', 'TimeSunSet',
       'Radiation', 'Month', 'Time2'],
      dtype='object')

In [17]:
colstobedrop = ['Id','Data','Time','TimeSunRise','TimeSunSet']

In [18]:
T2 = train_data1.drop(colstobedrop,axis=1)

In [19]:
T2 = T2.drop_duplicates(keep='first')

In [20]:
T2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 26148 entries, 0 to 26147
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   UNIXTime                26148 non-null  int64  
 1   Temperature             26148 non-null  int64  
 2   Pressure                26148 non-null  float64
 3   Humidity                26148 non-null  int64  
 4   WindDirection(Degrees)  26148 non-null  float64
 5   Speed                   26148 non-null  float64
 6   Radiation               26148 non-null  float64
 7   Month                   26148 non-null  object 
 8   Time2                   26148 non-null  object 
dtypes: float64(4), int64(3), object(2)
memory usage: 2.0+ MB


In [21]:
#sns.boxplot(data = X, x='UNIXTime')
sns.boxplot(data = train_data1, x='Radiation')

<AxesSubplot:xlabel='Radiation'>

In [22]:
import dtale
d = dtale.show(train_data1)
d.open_browser()

2021-11-22 12:08:57,943 - INFO     - NumExpr defaulting to 4 threads.


In [23]:
T2.head()

Unnamed: 0,UNIXTime,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,Radiation,Month,Time2
0,1473879005,57,30.45,68,26.7,4.5,680.04,9,morning
1,1476293121,50,30.47,96,144.96,10.12,277.37,10,morning
2,1477993220,47,30.48,56,119.52,3.37,1.29,10,midnight
3,1473013505,57,30.47,93,38.61,2.25,544.75,9,morning
4,1481885434,41,30.23,103,177.55,2.25,1.22,12,midnight


In [24]:
sns.pairplot(T2)

<seaborn.axisgrid.PairGrid at 0x1a29c19eee0>

In [25]:
X = T2.drop(['Radiation'],axis=1)
y = train_data1['Radiation']
print(X.shape)
print(y.shape)
X = pd.get_dummies(X)
print(X.shape)
print(y.shape)
X.head()

(26148, 8)
(26148,)
(26148, 16)
(26148,)


Unnamed: 0,UNIXTime,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,Month_10,Month_11,Month_12,Month_9,Time2_afternoon,Time2_dawn,Time2_evening,Time2_midnight,Time2_morning,Time2_noon
0,1473879005,57,30.45,68,26.7,4.5,0,0,0,1,0,0,0,0,1,0
1,1476293121,50,30.47,96,144.96,10.12,1,0,0,0,0,0,0,0,1,0
2,1477993220,47,30.48,56,119.52,3.37,1,0,0,0,0,0,0,1,0,0
3,1473013505,57,30.47,93,38.61,2.25,0,0,0,1,0,0,0,0,1,0
4,1481885434,41,30.23,103,177.55,2.25,0,0,1,0,0,0,0,1,0,0


In [26]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)

(20918, 16) (20918,) (5230, 16) (5230,)


In [27]:
SS = StandardScaler()
X_train[X_train.columns] = SS.fit_transform(X_train)
X_test[X_test.columns] = SS.transform(X_test)
X_train.head()

Unnamed: 0,UNIXTime,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,Month_10,Month_11,Month_12,Month_9,Time2_afternoon,Time2_dawn,Time2_evening,Time2_midnight,Time2_morning,Time2_noon
7901,0.412995,-0.504279,1.043493,-1.606783,0.003575,2.080778,-0.606183,1.702638,-0.573781,-0.542098,-0.448265,-0.449418,2.236196,-0.441557,-0.44788,-0.448957
4705,1.302606,-0.182251,-3.530261,-1.836991,0.921634,4.659543,-0.606183,-0.587324,1.742827,-0.542098,-0.448265,-0.449418,2.236196,-0.441557,-0.44788,-0.448957
10408,-0.683953,-0.021237,0.128742,0.848772,0.370318,-0.821765,1.649667,-0.587324,-0.573781,-0.542098,-0.448265,-0.449418,2.236196,-0.441557,-0.44788,-0.448957
4510,-0.589939,-0.504279,0.128742,-0.148797,0.472151,0.146705,1.649667,-0.587324,-0.573781,-0.542098,-0.448265,2.225099,-0.447188,-0.441557,-0.44788,-0.448957
19082,1.684267,-1.792392,-1.883709,0.541828,0.501933,-1.142677,-0.606183,-0.587324,1.742827,-0.542098,-0.448265,-0.449418,-0.447188,2.264712,-0.44788,-0.448957


In [28]:
# Model 1- LinearRegression
LR = LinearRegression()
LR.fit(X_train,y_train)
LR_Train = LR.predict(X_train)
LR_Test = LR.predict(X_test)
print(np.sqrt(mean_squared_error(y_train,LR_Train)))
print(np.sqrt(mean_squared_error(y_test,LR_Test)))

153.79010339312575
155.82446924797625


In [29]:
# Model 2 - DecisionTreeRegressor
'''GSV1 = GridSearchCV(estimator=DecisionTreeRegressor(random_state=42),cv=10,param_grid=dict(max_depth=np.arange(1,50,1)))
GSV1.fit(X_train,y_train)
print(GSV1.best_params_)
'''
DT = DecisionTreeRegressor(random_state=42,max_depth=12)
DT.fit(X_train,y_train)
DT_Train = DT.predict(X_train)
DT_Test = DT.predict(X_test)
print(np.sqrt(mean_squared_error(y_train,DT_Train)))
print(np.sqrt(mean_squared_error(y_test,DT_Test)))

67.66023305261128
109.04574736329516


In [30]:
# Model 3 - RandomForestRegressor
'''GSV3 = GridSearchCV(estimator=RandomForestRegressor(random_state=42),cv=10,param_grid=dict(n_estimators=np.arange(140,300,20)))
GSV3.fit(X_train,y_train)
print(GSV3.best_params_)
'''
RF = RandomForestRegressor(random_state=42,n_estimators=180)
RF.fit(X_train,y_train)
RF_Train = RF.predict(X_train)
RF_Test = RF.predict(X_test)
print(np.sqrt(mean_squared_error(y_train,RF_Train)))
print(np.sqrt(mean_squared_error(y_test,RF_Test)))

32.621651512265885
87.21406638774026


In [31]:
Model1 = SelectFromModel(estimator=RandomForestRegressor(random_state=42,n_estimators=180))
Model1.fit(X_train,y_train)
selected_features = X_train.columns[Model1.get_support()].to_list()
print(selected_features)
print(np.round(Model1.threshold_,decimals=2))

['UNIXTime', 'Temperature', 'Time2_noon']
0.06


In [32]:
'''GSV4 = GridSearchCV(estimator=XGBRegressor(random_state=42),cv=10,param_grid=dict(n_estimators=np.arange(1,400,100)))
GSV4.fit(X_train,y_train)
print(GSV4.best_params_)'''


xg = XGBRegressor(n_estimators=232)
xg.fit(X_train,y_train)
xg_train = xg.predict(X_train)
xg_test = xg.predict(X_test)
print(np.sqrt(mean_squared_error(y_train,xg_train)))
print(np.sqrt(mean_squared_error(y_test,xg_test)))
#xg.score(y_train,xg_train)'''

39.71355777683784
86.87101019958362


In [33]:
test_data1 = test_data.copy()
test_data1['Data'] = pd.to_datetime(test_data1['Data'])
test_data1['Time'] = pd.to_datetime(test_data1['Time'])
test_data1['Month'] = test_data1['Data'].dt.month.astype('str')
test_data1['Time'] = test_data1['Time'].dt.hour
test_data1['Time2'] = test_data1['Time'].apply(daypart)

In [34]:
colstobedrop = ['Id','Data','Time','TimeSunRise','TimeSunSet']

In [35]:
test_data1 = test_data1.drop(colstobedrop,axis=1)

In [36]:
test_data1 = pd.get_dummies(test_data1)

In [37]:
test_data1

Unnamed: 0,UNIXTime,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,Month_10,Month_11,Month_12,Month_9,Time2_afternoon,Time2_dawn,Time2_evening,Time2_midnight,Time2_morning,Time2_noon
0,1478720107,59,30.47,44,312.67,3.37,0,1,0,0,0,0,0,0,1,0
1,1474063503,59,30.48,83,38.01,6.75,0,0,0,1,0,0,0,0,0,1
2,1476109221,47,30.39,78,213.62,5.62,1,0,0,0,0,1,0,0,0,0
3,1481475056,45,30.40,98,176.63,4.50,0,0,1,0,0,0,0,0,1,0
4,1477493117,45,30.40,34,175.89,6.75,1,0,0,0,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6533,1478161518,46,30.46,38,176.09,9.00,0,1,0,0,0,0,0,1,0,0
6534,1474048530,51,30.45,100,31.25,13.50,0,0,0,1,0,0,0,0,1,0
6535,1474197908,51,30.46,89,147.11,7.87,0,0,0,1,0,0,0,1,0,0
6536,1481776204,47,30.27,54,231.71,6.75,0,0,1,0,0,0,1,0,0,0


In [38]:
SS1 = StandardScaler()

In [39]:
test_data1[test_data1.columns] = SS1.fit_transform(test_data1)

In [40]:
test_data1.columns

Index(['UNIXTime', 'Temperature', 'Pressure', 'Humidity',
       'WindDirection(Degrees)', 'Speed', 'Month_10', 'Month_11', 'Month_12',
       'Month_9', 'Time2_afternoon', 'Time2_dawn', 'Time2_evening',
       'Time2_midnight', 'Time2_morning', 'Time2_noon'],
      dtype='object')

In [42]:
test_data1.head()

Unnamed: 0,UNIXTime,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,Month_10,Month_11,Month_12,Month_9,Time2_afternoon,Time2_dawn,Time2_evening,Time2_midnight,Time2_morning,Time2_noon
0,0.222676,1.285096,0.861865,-1.19562,2.006103,-0.812187,-0.622512,1.737365,-0.577939,-0.533366,-0.451721,-0.455153,-0.463941,-0.432436,2.277154,-0.440631
1,-1.328678,1.285096,1.045454,0.305614,-1.268551,0.149263,-0.622512,-0.575584,-0.577939,1.874885,-0.451721,-0.455153,-0.463941,-0.432436,-0.439145,2.269475
2,-0.647144,-0.666163,-0.606841,0.113148,0.825172,-0.172169,1.606395,-0.575584,-0.577939,-0.533366,-0.451721,2.197065,-0.463941,-0.432436,-0.439145,-0.440631
3,1.140491,-0.991372,-0.423253,0.883011,0.384156,-0.490755,-0.622512,-0.575584,1.730286,-0.533366,-0.451721,-0.455153,-0.463941,-0.432436,2.277154,-0.440631
4,-0.186097,-0.991372,-0.423253,-1.580551,0.375333,0.149263,1.606395,-0.575584,-0.577939,-0.533366,-0.451721,2.197065,-0.463941,-0.432436,-0.439145,-0.440631


In [41]:
X_train.columns

Index(['UNIXTime', 'Temperature', 'Pressure', 'Humidity',
       'WindDirection(Degrees)', 'Speed', 'Month_10', 'Month_11', 'Month_12',
       'Month_9', 'Time2_afternoon', 'Time2_dawn', 'Time2_evening',
       'Time2_midnight', 'Time2_morning', 'Time2_noon'],
      dtype='object')

In [43]:
test_data1.head()

Unnamed: 0,UNIXTime,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,Month_10,Month_11,Month_12,Month_9,Time2_afternoon,Time2_dawn,Time2_evening,Time2_midnight,Time2_morning,Time2_noon
0,0.222676,1.285096,0.861865,-1.19562,2.006103,-0.812187,-0.622512,1.737365,-0.577939,-0.533366,-0.451721,-0.455153,-0.463941,-0.432436,2.277154,-0.440631
1,-1.328678,1.285096,1.045454,0.305614,-1.268551,0.149263,-0.622512,-0.575584,-0.577939,1.874885,-0.451721,-0.455153,-0.463941,-0.432436,-0.439145,2.269475
2,-0.647144,-0.666163,-0.606841,0.113148,0.825172,-0.172169,1.606395,-0.575584,-0.577939,-0.533366,-0.451721,2.197065,-0.463941,-0.432436,-0.439145,-0.440631
3,1.140491,-0.991372,-0.423253,0.883011,0.384156,-0.490755,-0.622512,-0.575584,1.730286,-0.533366,-0.451721,-0.455153,-0.463941,-0.432436,2.277154,-0.440631
4,-0.186097,-0.991372,-0.423253,-1.580551,0.375333,0.149263,1.606395,-0.575584,-0.577939,-0.533366,-0.451721,2.197065,-0.463941,-0.432436,-0.439145,-0.440631


In [45]:
X_train.head()

Unnamed: 0,UNIXTime,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,Month_10,Month_11,Month_12,Month_9,Time2_afternoon,Time2_dawn,Time2_evening,Time2_midnight,Time2_morning,Time2_noon
7901,0.412995,-0.504279,1.043493,-1.606783,0.003575,2.080778,-0.606183,1.702638,-0.573781,-0.542098,-0.448265,-0.449418,2.236196,-0.441557,-0.44788,-0.448957
4705,1.302606,-0.182251,-3.530261,-1.836991,0.921634,4.659543,-0.606183,-0.587324,1.742827,-0.542098,-0.448265,-0.449418,2.236196,-0.441557,-0.44788,-0.448957
10408,-0.683953,-0.021237,0.128742,0.848772,0.370318,-0.821765,1.649667,-0.587324,-0.573781,-0.542098,-0.448265,-0.449418,2.236196,-0.441557,-0.44788,-0.448957
4510,-0.589939,-0.504279,0.128742,-0.148797,0.472151,0.146705,1.649667,-0.587324,-0.573781,-0.542098,-0.448265,2.225099,-0.447188,-0.441557,-0.44788,-0.448957
19082,1.684267,-1.792392,-1.883709,0.541828,0.501933,-1.142677,-0.606183,-0.587324,1.742827,-0.542098,-0.448265,-0.449418,-0.447188,2.264712,-0.44788,-0.448957


In [49]:
Final_op = RF.predict(test_data1)

In [50]:
Finalop1 = pd.DataFrame(test_data['Id'])

In [51]:
Finalop1['Radiation'] = Final_op

In [52]:
Finalop1['Radiation'].describe()

count    6538.000000
mean      206.357822
std       299.780717
min         1.171889
25%         1.228444
50%         4.044472
75%       372.622056
max      1098.063667
Name: Radiation, dtype: float64

In [53]:
Finalop1['Radiation'].describe()

count    6538.000000
mean      206.357822
std       299.780717
min         1.171889
25%         1.228444
50%         4.044472
75%       372.622056
max      1098.063667
Name: Radiation, dtype: float64

In [250]:
train_data['Radiation'].describe()

count    26148.000000
mean       208.044780
std        316.090247
min          1.130000
25%          1.230000
50%          2.710000
75%        358.945000
max       1601.260000
Name: Radiation, dtype: float64

In [54]:
Finalop1.to_csv('submission.csv',header=False,index=False)

In [56]:
Finalop1.describe()

Unnamed: 0,Id,Radiation
count,6538.0,6538.0
mean,16362.182319,206.357822
std,9379.483802,299.780717
min,6.0,1.171889
25%,8317.25,1.228444
50%,16213.5,4.044472
75%,24536.75,372.622056
max,32683.0,1098.063667
