<div class="alert alert-block alert-success">
    <h1 align="center">Scikit-Learn Tips</h1>
    <h3 align="center">Tip 01: Random State</h3>
    <h4 align="center"><a href="https://github.com/SMSajadi99/Practical-Machine-Learning">Seyed Mohammad Sajadi</a></h5>
</div>


Q: Why set a value for "random_state"?

A: Ensures that a "random" process will output the same results every time, which makes your code reproducible (by you and others!)

See example 👇

<img src="https://cdn-coiao.nitrocdn.com/CYHudqJZsSxQpAPzLkHFOkuzFKDpEHGF/assets/static/optimized/rev-f6cb400/wp-content/uploads/2022/05/sklearn-train-test-split_syntax-explanation_v2.png">

In [1]:
import sklearn
sklearn.__version__

'1.2.2'

In [4]:
import pandas as pd
df = pd.read_csv(r'titanic.csv')
df

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.00,0,0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2.00,1,2,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.00,1,2,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.00,1,2,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3,0,"Zabour, Miss. Hileni",female,14.50,1,0,2665,14.4542,,C,,328.0,
1305,3,0,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C,,,
1306,3,0,"Zakarian, Mr. Mapriededer",male,26.50,0,0,2656,7.2250,,C,,304.0,
1307,3,0,"Zakarian, Mr. Ortin",male,27.00,0,0,2670,7.2250,,C,,,


In [5]:
cols = ['fare', 'embarked', 'sex']
X = df[cols]
y = df['survived']

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X

Unnamed: 0,fare,embarked,sex
0,211.3375,S,female
1,151.5500,S,male
2,151.5500,S,female
3,151.5500,S,male
4,151.5500,S,female
...,...,...,...
1304,14.4542,C,female
1305,14.4542,C,female
1306,7.2250,C,male
1307,7.2250,C,male


In [8]:
# any positive integer can be used for the random_state value
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
X_train.head()

Unnamed: 0,fare,embarked,sex
307,77.2875,S,male
193,211.3375,S,female
646,31.3875,S,female
502,19.5,S,female
33,26.55,S,female


In [9]:
# using the SAME random_state value results in the SAME random split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
X_train.head()

Unnamed: 0,fare,embarked,sex
307,77.2875,S,male
193,211.3375,S,female
646,31.3875,S,female
502,19.5,S,female
33,26.55,S,female


In [10]:
# using a DIFFERENT random_state value results in a DIFFERENT random split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)
X_train.head()

Unnamed: 0,fare,embarked,sex
565,13.0,S,male
1181,9.325,S,male
680,7.225,C,male
644,31.3875,S,male
340,39.0,S,female
