# Decision Tree

`Decision Tree` - bu statistika, data maining va ML da qo'llaniladigan bashoratlar daraxti hisoblanadi. Ushbu algoritm regression va classification daraxtlarda bashorat modeli sifatida ishlatiladi.
Bu bashorat daraxtida barglar class belgilarini, shoxlar esa shu klass ga olib keladigan xususiyatlarni ifodalaydi. `Regression Tree` haqiqiy qiymat qabul qilivchi sinf hisoblanadi. `Classification Tree`da esa (0 va 1) qiymatlar qaytarib ovoz berishlar soniga qarab qaror qabul qiladi.
`Decision Tree`larni qurish algoritmida odatda elementlar to'plamining eng yaxshi ajratadigan har bir bosqichida o'zgaruvchini tanlash orqali yuqoridan pastga tuzib chiqadi. Bu odatda kichik `data` lar bilan ishlashda qo'l keladi.

## Decision Tree Regressor                                                 

<img src="dtr.png">

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2

### Birinchi navbatda datani yuklab olamiz

In [2]:
data = pd.read_csv("DTree.csv")
data

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,25
1,Yomg'ir,Issiq,Yuqori,True,30
2,Bulutli,Issiq,Yuqori,False,46
3,Quyoshli,Yaxshi,Yuqori,False,45
4,Quyoshli,Salqin,Normal,False,52
5,Quyoshli,Salqin,Normal,True,23
6,Bulutli,Salqin,Normal,True,43
7,Yomg'ir,Yaxshi,Yuqori,False,35
8,Yomg'ir,Salqin,Normal,False,38
9,Quyoshli,Yaxshi,Normal,False,46


`Target valueni yuklab, numpyga o'tkazib olish`

In [3]:
uyinchi = data["Oyinchilar"]
uyinchi = uyinchi.to_numpy()
uyinchi

array([25, 30, 46, 45, 52, 23, 43, 35, 38, 46, 48, 52, 44, 30])

`Standart og'ish - bu o'rtacha kvadratik og'ishlarning o'rtacha kvadrat ildizi, ya'ni
std = ((abs(y - y.mean())**2)/len(y))**0.5`

In [5]:
sto = uyinchi.std()
sto

9.321086474291743

`Target valuening o'rtacha qiymati`

In [7]:
urtacha = np.mean(uyinchi)
urtacha

39.785714285714285

`CV - bu standart og'ishning o'rtacha qiymatiga nisbatini
foizdagi qiymati. CV = (target.std()/mean(target)) * 100`

In [8]:
cv = (sto / urtacha)*100
cv

23.428224531433468

### 1-qadam. To'rtta ustunning standart o'g'ishini hisoblaymiz

* `Datani ob-havo bo'yicha guruhlarga ajratamiz(Bulutli, Yomg'ir, Quyoshli)` 
* `Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz`

In [9]:
data_obhavo = data.groupby("Ob-Havo")
data_obhavo = data_obhavo.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_obhavo

Unnamed: 0_level_0,Std,Soni
Ob-Havo,Unnamed: 1_level_1,Unnamed: 2_level_1
Bulutli,3.49106,4
Yomg'ir,7.782031,5
Quyoshli,10.870143,5


* `Datani Temperatura bo'yicha guruhlarga ajratamiz(Yaxshi, Issiq, Salqin)`
* `Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz`

In [10]:
data_temp = data.groupby("Temp")
data_temp = data_temp.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_temp

Unnamed: 0_level_0,Std,Soni
Temp,Unnamed: 1_level_1,Unnamed: 2_level_1
Yaxshi,7.65216,6
Issiq,8.954747,4
Salqin,10.511898,4


* `Datani Namlik bo'yicha guruhlarga ajratamiz(Normal, Yuqori)`
* `Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz`

In [11]:
data_namlik = data.groupby("Namlik")
data_namlik = data_namlik.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_namlik

Unnamed: 0_level_0,Std,Soni
Namlik,Unnamed: 1_level_1,Unnamed: 2_level_1
Normal,8.734169,7
Yuqori,9.363411,7


* `Datani Shamol bo'yicha guruhlarga ajratamiz(True, False)`
* `Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz`

In [12]:
data_shamol = data.groupby("Shamol")
data_shamol = data_shamol.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_shamol

Unnamed: 0_level_0,Std,Soni
Shamol,Unnamed: 1_level_1,Unnamed: 2_level_1
False,7.873016,8
True,10.593499,6


### 2-qadam. Har bir ustunning STD sini aniqlaymiz

* `Har bir guruh sonini, umumiy satrlar soniga bo'lib, har bir guruhning standart og'ishiga ko'paytiramiz va O'yinchilar ustunining STD(9.321086474291743) dan ayiramiz`

* Ob havo

In [13]:
stdr_obhavo = uyinchi.std() - np.dot(data_obhavo["Soni"]/14,data_obhavo["Std"])
stdr_obhavo

1.6621503366302335

* Temperatura

In [14]:
stdr_temp = uyinchi.std() - np.dot(data_temp["Soni"]/14,data_temp["Std"])
stdr_temp

0.4796905747633211

* Namlik

In [15]:
stdr_namlik = uyinchi.std() - np.dot(data_namlik["Soni"]/14,data_namlik["Std"])
stdr_namlik

0.272296195489826

* Shamol

In [16]:
stdr_shamol = uyinchi.std() - np.dot(data_shamol["Soni"]/14,data_shamol["Std"])
stdr_shamol

0.28214938055733185

### `Xulosa:` Yuqoridagi hisob-kitoblar(Ob-havo = 1.6621503366302335, Temperatura = 0.4796905747633211, Namlik = 0.272296195489826, Shamol = 0.28214938055733185)dan eng kattasi Ob-havo bo'lgani sababli daraxtning boshlanish qismi shundan boshlanadi

<img src="1.png">

### 3-qadam.

In [59]:
data[["Ob-Havo","Oyinchilar"]]

Unnamed: 0,Ob-Havo,Oyinchilar
0,Yomg'ir,25
1,Yomg'ir,30
2,Bulutli,46
3,Quyoshli,45
4,Quyoshli,52
5,Quyoshli,23
6,Bulutli,43
7,Yomg'ir,35
8,Yomg'ir,38
9,Quyoshli,46


In [17]:
# Bulutli - 4 - 185/4
# Yomg'ir - 5 - 176/5
# Quyoshli - 5 - 196/5

# Bulutli 	3.491060 	4
# Yomg'ir 	7.782031 	5
# Quyoshli 	10.870143 	5
b_ur = 185/4 # bulutli guruhining o'rtachasi
y_ur = 176/5 # yomg'irli guruhining o'rtachasi
q_ur = 196/5 # quyoshli guruhining o'rtachasi
bulut = (3.491060/b_ur)*100 # bulutli CV
yomgir = (7.782031/y_ur)*100 # yomg'irli CV
quyosh = (10.870143/q_ur)*100 # quyoshli CV


* `Ob-Havo` ustuni asosiy bo'lganligi sababli uni guruhlarining qiymatlarini topib olamiz. CV = 10% dan, o'yinchi soni 3 dak kichik bo'lsa shohlanishni tugatamiz

In [18]:
data_obhavo["AVG"] = b_ur,y_ur,q_ur
data_obhavo["CV"] = bulut,yomgir,quyosh
data_obhavo

Unnamed: 0_level_0,Std,Soni,AVG,CV
Ob-Havo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bulutli,3.49106,4,46.25,7.548238
Yomg'ir,7.782031,5,35.2,22.108043
Quyoshli,10.870143,5,39.2,27.729957


<img src="cv.png">

* Yuqorida bulutli qiymat CV ning qiymati 10% dan kichik shu sababli uni 46.25 natijada to'xtatamiz

<img src="2.png">

### 4-qadam. Endi ob-havo quyoshli bo'lganda yuqoridagilarni qayta hisoblaymiz

In [19]:
data_quyosh = data[data["Ob-Havo"]=="Quyoshli"]
data_quyosh

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
3,Quyoshli,Yaxshi,Yuqori,False,45
4,Quyoshli,Salqin,Normal,False,52
5,Quyoshli,Salqin,Normal,True,23
9,Quyoshli,Yaxshi,Normal,False,46
13,Quyoshli,Yaxshi,Yuqori,True,30


* `Ob-havo quyoshli bo'lganda target value`

In [20]:
uyinchi_quyosh = data_quyosh["Oyinchilar"]
uyinchi_quyosh = uyinchi_quyosh.to_numpy()
uyinchi_quyosh

array([45, 52, 23, 46, 30])

In [21]:
uyinchi_quyosh.std()

10.870142593360953

In [22]:
# 1. Datani Temperatura bo'yicha guruhlarga ajratamiz(Yaxshi, Salqin)
data_temp_quyosh = data_quyosh.groupby("Temp")

# 2. Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz
data_temp_quyosh = data_temp_quyosh.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_temp_quyosh

Unnamed: 0_level_0,Std,Soni
Temp,Unnamed: 1_level_1,Unnamed: 2_level_1
Yaxshi,7.318166,3
Salqin,14.5,2


In [23]:
# 1. Datani Temperatura bo'yicha guruhlarga ajratamiz(Normal, Yuqori)
data_namlik_quyosh = data_quyosh.groupby("Namlik")

# 2. Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz
data_namlik_quyosh = data_namlik_quyosh.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_namlik_quyosh

Unnamed: 0_level_0,Std,Soni
Namlik,Unnamed: 1_level_1,Unnamed: 2_level_1
Yuqori,7.5,2
Normal,12.498889,3


In [24]:
# 1. Datani Temperatura bo'yicha guruhlarga ajratamiz(True,False)
data_shamol_quyosh = data_quyosh.groupby("Shamol")

# 2. Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz
data_shamol_quyosh = data_shamol_quyosh.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_shamol_quyosh

Unnamed: 0_level_0,Std,Soni
Shamol,Unnamed: 1_level_1,Unnamed: 2_level_1
False,3.091206,3
True,3.5,2


### Endi standart og'ishni topamiz

`Temperatura`

In [25]:
# Ob-havo quyoshli bo'lganda har bir guruh sonini, umumiy satrlar soniga bo'lib, har bir guruhning standart og'ishiga 
# ko'paytiramiz va O'yinchilar ustunining STD(10.870142593360953) dan ayiramiz
stdr_temp_quyosh = uyinchi_quyosh.std() - np.dot(data_temp_quyosh["Soni"]/5,data_temp_quyosh["Std"])
stdr_temp_quyosh

0.6792429133409215

* `Namlik`

In [26]:
# Ob-havo quyoshli bo'lganda har bir guruh sonini, umumiy satrlar soniga bo'lib, har bir guruhning standart og'ishiga 
# ko'paytiramiz va O'yinchilar ustunining STD(10.870142593360953) dan ayiramiz
stdr_namlik_quyosh = uyinchi_quyosh.std() - np.dot(data_namlik_quyosh["Soni"]/5,data_namlik_quyosh["Std"])
stdr_namlik_quyosh

0.37080928965988313

* `Shamol`

In [27]:
# Ob-havo quyoshli bo'lganda har bir guruh sonini, umumiy satrlar soniga bo'lib, har bir guruhning standart og'ishiga 
# ko'paytiramiz va O'yinchilar ustunining STD(10.870142593360953) dan ayiramiz
stdr_shamol_quyosh = uyinchi_quyosh.std() - np.dot(data_shamol_quyosh["Soni"]/5,data_shamol_quyosh["Std"])
stdr_shamol_quyosh

7.615418894261811

### `Xulosa:` Yuqoridagi hisob-kitoblar(Temperatura = 0.6792429133409215, Namlik = 0.37080928965988313, Shamol = 7.615418894261811)dan eng kattasi Shamol bo'lgani sababli daraxtning keyingi qismi shu bo'ladi

<img src="3.png">

### 5-qadam

In [28]:
data_quyosh[["Shamol","Oyinchilar"]]

Unnamed: 0,Shamol,Oyinchilar
3,False,45
4,False,52
5,True,23
9,False,46
13,True,30


In [30]:
# False- 3 - 143/3
# True - 2 - 53/2

# False 	3.091206 	3
# True 	    3.500000 	2

fa_ur = 143/3 # False o'rtacha
tr_ur = 53/2  # True o'rtacha
false = (3.091206/fa_ur)*100 # False CV
true = (3.500000/tr_ur)*100 # True CV

* `Shamol` ustuni asosiy bo'lganligi sababli uni guruhlarining qiymatlarini topib olamiz. CV = 10% dan, o'yinchi soni 3 dak kichik bo'lsa shohlanishni tugatamiz

In [31]:
data_shamol_quyosh["AVG"] = fa_ur,tr_ur
data_shamol_quyosh["CV"] = false, true
data_shamol_quyosh

Unnamed: 0_level_0,Std,Soni,AVG,CV
Shamol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,3.091206,3,47.666667,6.485048
True,3.5,2,26.5,13.207547


* Yuqorida Falsening qiymati CV ning qiymati 10% dan kichik, Truening o'yinchilar soni 3 dan kichik shu sababli False=47.666667 va True=26.5 natijada to'xtatamiz

<img src="4.png">

### 6-qadam. Endi ob-havo yomg'irli bo'lganda yuqoridagilarni qayta hisoblaymiz

In [32]:
data_yomgir = data[data["Ob-Havo"]=="Yomg'ir"]
data_yomgir

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,25
1,Yomg'ir,Issiq,Yuqori,True,30
7,Yomg'ir,Yaxshi,Yuqori,False,35
8,Yomg'ir,Salqin,Normal,False,38
10,Yomg'ir,Yaxshi,Normal,True,48


* `Target value yomg'irli uchun`

In [33]:
uyinchi_yomgir = data_yomgir["Oyinchilar"]
uyinchi_yomgir = uyinchi_yomgir.to_numpy()
uyinchi_yomgir

array([25, 30, 35, 38, 48])

In [34]:
uyinchi_yomgir.std()

7.782030583337487

In [35]:
# 1. Datani Temperatura bo'yicha guruhlarga ajratamiz(Yaxshi, Issiq, Salqin)
data_temp_yomgir = data_yomgir.groupby("Temp")

# 2. Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz
data_temp_yomgir = data_temp_yomgir.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_temp_yomgir

Unnamed: 0_level_0,Std,Soni
Temp,Unnamed: 1_level_1,Unnamed: 2_level_1
Salqin,0.0,1
Issiq,2.5,2
Yaxshi,6.5,2


In [36]:
# 1. Datani Temperatura bo'yicha guruhlarga ajratamiz(Normal, Yuqori)
data_namlik_yomgir = data_yomgir.groupby("Namlik")

# 2. Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz
data_namlik_yomgir = data_namlik_yomgir.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_namlik_yomgir

Unnamed: 0_level_0,Std,Soni
Namlik,Unnamed: 1_level_1,Unnamed: 2_level_1
Yuqori,4.082483,3
Normal,5.0,2


In [37]:
# 1. Datani Temperatura bo'yicha guruhlarga ajratamiz(True,False)
data_shamol_yomgir = data_yomgir.groupby("Shamol")

# 2. Har bir guruhning standart og'ishi va har bir guruh nechta satrligini hisoblab chiqamiz. STD bo'yicha tartiblaymiz
data_shamol_yomgir = data_shamol_yomgir.agg(Std = ("Oyinchilar", lambda x: x.std(ddof=0)), Soni = ("Oyinchilar", np.count_nonzero)).sort_values("Std")
data_shamol_yomgir

Unnamed: 0_level_0,Std,Soni
Shamol,Unnamed: 1_level_1,Unnamed: 2_level_1
False,5.557777,3
True,9.0,2


### Endi standart og'ishni topamiz

`Temperatura`

In [38]:
# Ob-havo yomg'irli bo'lganda har bir guruh sonini, umumiy satrlar soniga bo'lib, har bir guruhning standart og'ishiga 
# ko'paytiramiz va O'yinchilar ustunining STD(7.782030583337487) dan ayiramiz
stdr_temp_yomgir = uyinchi_yomgir.std() - np.dot(data_temp_yomgir["Soni"]/5,data_temp_yomgir["Std"])
stdr_temp_yomgir

4.182030583337488

* `Namlik`

In [39]:
# Ob-havo yomg'irli bo'lganda har bir guruh sonini, umumiy satrlar soniga bo'lib, har bir guruhning standart og'ishiga 
# ko'paytiramiz va O'yinchilar ustunining STD(7.782030583337487) dan ayiramiz
stdr_namlik_yomgir = uyinchi_yomgir.std() - np.dot(data_namlik_yomgir["Soni"]/5,data_namlik_yomgir["Std"])
stdr_namlik_yomgir

3.3325408405543087

* `Shamol`

In [40]:
# Ob-havo yomg'irli bo'lganda har bir guruh sonini, umumiy satrlar soniga bo'lib, har bir guruhning standart og'ishiga 
# ko'paytiramiz va O'yinchilar ustunining STD(7.782030583337487) dan ayiramiz
stdr_shamol_yomgir = uyinchi_yomgir.std() - np.dot(data_shamol_yomgir["Soni"]/5,data_shamol_yomgir["Std"])
stdr_shamol_yomgir

0.8473641832308747

### `Xulosa:` Yuqoridagi hisob-kitoblar(Temperatura = 4.182030583337488, Namlik = 3.3325408405543087, Shamol = 0.8473641832308747)dan eng kattasi Temperatura bo'lgani sababli daraxtning keyingi qismi shu bo'ladi

<img src="5.png">

### 7-qadam

In [41]:
data_yomgir[["Temp","Oyinchilar"]]

Unnamed: 0,Temp,Oyinchilar
0,Issiq,25
1,Issiq,30
7,Yaxshi,35
8,Salqin,38
10,Yaxshi,48


In [42]:
# Salqin - 1 - 38/1
# Issiq  - 2 - 55/2
# Yaxshi - 2 - 82/2

# Salqin 	0.0 	1
# Issiq 	2.5 	2
# Yaxshi 	6.5 	2

sa_ur = 38/1
is_ur = 55/2
yax_ur = 82/2

salqin = (0.0/sa_ur)*100
issiq = (2.5/is_ur)*100
yaxshi = (6.5/yax_ur)*100

* `Shamol` ustuni asosiy bo'lganligi sababli uni guruhlarining qiymatlarini topib olamiz. CV = 10% dan, o'yinchi soni 3 dak kichik bo'lsa shohlanishni tugatamiz

In [43]:
data_temp_yomgir["AVG"] = sa_ur,is_ur, yax_ur
data_temp_yomgir["CV"] = salqin,issiq,yaxshi
data_temp_yomgir

Unnamed: 0_level_0,Std,Soni,AVG,CV
Temp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Salqin,0.0,1,38.0,0.0
Issiq,2.5,2,27.5,9.090909
Yaxshi,6.5,2,41.0,15.853659


* Yuqorida Salqin va Issiqning qiymati CV ning qiymati 10% dan kichik, barchasida o'yinchilar soni 3 dan kichik shu sababli Salqin=38, Issiq=27.5 va Yaxshi=41 natijada to'xtatamiz

<img src="6.png">

## Decision Tree Classification

*`Decision Tree Classification` algoritmi ma'lumotlar to'plamini to'liq ajratishga harakat qiladi, shuning uchun barcha barg tugunlari, ya'ni ma'lumotlarni keyinchalik ajratmaydigan tugunlar bitta classga tegishli bo'ladi. Bularga sof barg tugunlari deyiladi.
<img src="cl1.webp">
* Ammo ko'pincha siz aralash barg tugunlariga duch kelasiz. Bu yerda barcha ma'lumotlar nuqtalari bir xil classga ega emas.

* `Gini Impurity` turli sinflar bo'yicha dispersiya o'lchovidir
<img src="gini.webp">

* `Entropy` - Gini Impurity singari, entropiya ham tugun ichidagi tartibsizlik o'lchovidir. Va tartibsizlik, qaror daraxtlari kontekstida, barcha sinflar ma'lumotlarda teng ravishda mavjud bo'lgan tugunga ega.

<img src="entropy.webp">

In [44]:
data.describe()

Unnamed: 0,Oyinchilar
count,14.0
mean,39.785714
std,9.672949
min,23.0
25%,31.25
50%,43.5
75%,46.0
max,52.0


In [45]:
for i in range(len(data)):
    if data.loc[i,"Oyinchilar"] > 39:
        data.loc[i,"Oyinchilar"] = 1
    else:
        data.loc[i,"Oyinchilar"] = 0

In [46]:
data

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,0
1,Yomg'ir,Issiq,Yuqori,True,0
2,Bulutli,Issiq,Yuqori,False,1
3,Quyoshli,Yaxshi,Yuqori,False,1
4,Quyoshli,Salqin,Normal,False,1
5,Quyoshli,Salqin,Normal,True,0
6,Bulutli,Salqin,Normal,True,1
7,Yomg'ir,Yaxshi,Yuqori,False,0
8,Yomg'ir,Salqin,Normal,False,0
9,Quyoshli,Yaxshi,Normal,False,1


In [47]:
true = len(data[data["Oyinchilar"]==1])
false = len(data[data["Oyinchilar"]==0])

In [48]:
true

8

In [49]:
false

6

In [50]:
true_pr = true / len(data["Oyinchilar"])
false_pr = false / len(data["Oyinchilar"])

In [51]:
true_pr

0.5714285714285714

In [52]:
false_pr

0.42857142857142855

In [53]:
entropy_label = - true_pr * np.log2(true_pr) - false_pr * np.log2(false_pr)

In [54]:
entropy_label

0.9852281360342515

## Har bir label entropysi

In [55]:
data

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,0
1,Yomg'ir,Issiq,Yuqori,True,0
2,Bulutli,Issiq,Yuqori,False,1
3,Quyoshli,Yaxshi,Yuqori,False,1
4,Quyoshli,Salqin,Normal,False,1
5,Quyoshli,Salqin,Normal,True,0
6,Bulutli,Salqin,Normal,True,1
7,Yomg'ir,Yaxshi,Yuqori,False,0
8,Yomg'ir,Salqin,Normal,False,0
9,Quyoshli,Yaxshi,Normal,False,1


`Ob-havo`

In [56]:
data[data["Ob-Havo"]=="Yomg'ir"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,0
1,Yomg'ir,Issiq,Yuqori,True,0
7,Yomg'ir,Yaxshi,Yuqori,False,0
8,Yomg'ir,Salqin,Normal,False,0
10,Yomg'ir,Yaxshi,Normal,True,1


In [57]:
data[data["Ob-Havo"]=="Bulutli"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
2,Bulutli,Issiq,Yuqori,False,1
6,Bulutli,Salqin,Normal,True,1
11,Bulutli,Yaxshi,Yuqori,True,1
12,Bulutli,Issiq,Normal,False,1


In [58]:
data[data["Ob-Havo"]=="Quyoshli"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
3,Quyoshli,Yaxshi,Yuqori,False,1
4,Quyoshli,Salqin,Normal,False,1
5,Quyoshli,Salqin,Normal,True,0
9,Quyoshli,Yaxshi,Normal,False,1
13,Quyoshli,Yaxshi,Yuqori,True,0


In [59]:
obhavo_entropy = (-(4/5)*np.log2(4/5)-(1/5)*np.log2(1/5)) * 5/14 + (-np.log2(1)) * 4/14 + (-(2/5)*np.log2(2/5)-(3/5)*np.log2(3/5)) * 5/14

In [60]:
obhavo_entropy

0.6045995319078682

`Temperatura`

In [61]:
data[data["Temp"]=="Issiq"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,0
1,Yomg'ir,Issiq,Yuqori,True,0
2,Bulutli,Issiq,Yuqori,False,1
12,Bulutli,Issiq,Normal,False,1


In [62]:
data[data["Temp"]=="Yaxshi"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
3,Quyoshli,Yaxshi,Yuqori,False,1
7,Yomg'ir,Yaxshi,Yuqori,False,0
9,Quyoshli,Yaxshi,Normal,False,1
10,Yomg'ir,Yaxshi,Normal,True,1
11,Bulutli,Yaxshi,Yuqori,True,1
13,Quyoshli,Yaxshi,Yuqori,True,0


In [63]:
data[data["Temp"]=="Salqin"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
4,Quyoshli,Salqin,Normal,False,1
5,Quyoshli,Salqin,Normal,True,0
6,Bulutli,Salqin,Normal,True,1
8,Yomg'ir,Salqin,Normal,False,0


In [64]:
temp_entropy = (-(2/4)*np.log2(2/4)-(2/4)*np.log2(2/4)) * 4/14 + (-(2/4)*np.log2(2/4)-(2/4)*np.log2(2/4)) * 4/14 + (-(2/6)*np.log2(2/6)-(4/6)*np.log2(4/6)) * 6/14

In [65]:
temp_entropy

0.9649839288804956

`Namlik`

In [66]:
data[data["Namlik"]=="Normal"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
4,Quyoshli,Salqin,Normal,False,1
5,Quyoshli,Salqin,Normal,True,0
6,Bulutli,Salqin,Normal,True,1
8,Yomg'ir,Salqin,Normal,False,0
9,Quyoshli,Yaxshi,Normal,False,1
10,Yomg'ir,Yaxshi,Normal,True,1
12,Bulutli,Issiq,Normal,False,1


In [67]:
data[data["Namlik"]=="Yuqori"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,0
1,Yomg'ir,Issiq,Yuqori,True,0
2,Bulutli,Issiq,Yuqori,False,1
3,Quyoshli,Yaxshi,Yuqori,False,1
7,Yomg'ir,Yaxshi,Yuqori,False,0
11,Bulutli,Yaxshi,Yuqori,True,1
13,Quyoshli,Yaxshi,Yuqori,True,0


In [68]:
namlik_entropy = (-(2/7)*np.log2(2/7)-(5/7)*np.log2(5/7)) * 7/14 + (-(3/7)*np.log2(3/7)-(4/7)*np.log2(4/7)) * 7/14

In [69]:
namlik_entropy

0.9241743523004413

`Shamol`

In [70]:
data[data["Shamol"]==True]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
1,Yomg'ir,Issiq,Yuqori,True,0
5,Quyoshli,Salqin,Normal,True,0
6,Bulutli,Salqin,Normal,True,1
10,Yomg'ir,Yaxshi,Normal,True,1
11,Bulutli,Yaxshi,Yuqori,True,1
13,Quyoshli,Yaxshi,Yuqori,True,0


In [71]:
data[data["Shamol"]==False]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,0
2,Bulutli,Issiq,Yuqori,False,1
3,Quyoshli,Yaxshi,Yuqori,False,1
4,Quyoshli,Salqin,Normal,False,1
7,Yomg'ir,Yaxshi,Yuqori,False,0
8,Yomg'ir,Salqin,Normal,False,0
9,Quyoshli,Yaxshi,Normal,False,1
12,Bulutli,Issiq,Normal,False,1


In [72]:
shamol_entropy = (-(3/6)*np.log2(3/6)-(3/6)*np.log2(3/6)) * 6/14 + (-(3/8)*np.log2(3/8)-(5/8)*np.log2(5/8)) * 8/14

In [73]:
shamol_entropy

0.9739622873856943

### `Information gain`

In [74]:
entropy_label - obhavo_entropy

0.38062860412638333

In [75]:
entropy_label - temp_entropy

0.020244207153755966

In [76]:
entropy_label - namlik_entropy

0.06105378373381021

In [77]:
entropy_label - shamol_entropy

0.011265848648557175

# Ob-havo asosiy bo'lganiga qolgan shohlarni topish

In [78]:
data_quyosh = data[data["Ob-Havo"]=="Quyoshli"]

In [79]:
data_quyosh

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
3,Quyoshli,Yaxshi,Yuqori,False,1
4,Quyoshli,Salqin,Normal,False,1
5,Quyoshli,Salqin,Normal,True,0
9,Quyoshli,Yaxshi,Normal,False,1
13,Quyoshli,Yaxshi,Yuqori,True,0


In [80]:
data_bulut = data[data["Ob-Havo"]=="Bulutli"]

In [81]:
data_bulut

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
2,Bulutli,Issiq,Yuqori,False,1
6,Bulutli,Salqin,Normal,True,1
11,Bulutli,Yaxshi,Yuqori,True,1
12,Bulutli,Issiq,Normal,False,1


In [82]:
data_yomgir = data[data["Ob-Havo"]=="Yomg'ir"]

In [83]:
data_yomgir

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,0
1,Yomg'ir,Issiq,Yuqori,True,0
7,Yomg'ir,Yaxshi,Yuqori,False,0
8,Yomg'ir,Salqin,Normal,False,0
10,Yomg'ir,Yaxshi,Normal,True,1


`Temperatura`

In [84]:
data_quyosh[data_quyosh["Temp"]=="Yaxshi"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
3,Quyoshli,Yaxshi,Yuqori,False,1
9,Quyoshli,Yaxshi,Normal,False,1
13,Quyoshli,Yaxshi,Yuqori,True,0


In [85]:
data_quyosh[data_quyosh["Temp"]=="Salqin"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
4,Quyoshli,Salqin,Normal,False,1
5,Quyoshli,Salqin,Normal,True,0


In [86]:
temp_entropy_quyosh = (-(2/3)*np.log2(2/3)-(1/3)*np.log2(1/3)) * 3/5 + (-(1/2)*np.log2(1/2)-(1/2)*np.log2(1/2)) * 2/5

In [87]:
temp_entropy_quyosh

0.9509775004326938

`Namlik`

In [88]:
data_quyosh[data_quyosh["Namlik"]=="Normal"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
4,Quyoshli,Salqin,Normal,False,1
5,Quyoshli,Salqin,Normal,True,0
9,Quyoshli,Yaxshi,Normal,False,1


In [89]:
data_quyosh[data_quyosh["Namlik"]=="Yuqori"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
3,Quyoshli,Yaxshi,Yuqori,False,1
13,Quyoshli,Yaxshi,Yuqori,True,0


In [90]:
namlik_entropy_quyosh = (-(2/3)*np.log2(2/3)-(1/3)*np.log2(1/3)) * 3/5 + (-(1/2)*np.log2(1/2)-(1/2)*np.log2(1/2)) * 2/5

In [91]:
namlik_entropy_quyosh

0.9509775004326938

`Shamol`

In [92]:
data_quyosh[data_quyosh["Shamol"]==True]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
5,Quyoshli,Salqin,Normal,True,0
13,Quyoshli,Yaxshi,Yuqori,True,0


In [93]:
data_quyosh[data_quyosh["Shamol"]==False]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
3,Quyoshli,Yaxshi,Yuqori,False,1
4,Quyoshli,Salqin,Normal,False,1
9,Quyoshli,Yaxshi,Normal,False,1


In [94]:
shamol_entropy_quyosh = (-np.log2(1)) * 2/5 + (-np.log2(1)) * 3/5

In [95]:
shamol_entropy_quyosh

-0.0

### `Information gain quyosh`

In [96]:
entropy_label - temp_entropy_quyosh

0.03425063560155772

In [97]:
entropy_label - namlik_entropy_quyosh

0.03425063560155772

In [98]:
entropy_label - shamol_entropy_quyosh

0.9852281360342515

### `Information gain bulut`

* Hamma qiymati bir xil bo'lgani sababli barchasi 0 ga teng

## Yomg'ir bo'lganda

`Temperatura`

In [99]:
data_yomgir[data_yomgir["Temp"]=="Yaxshi"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
7,Yomg'ir,Yaxshi,Yuqori,False,0
10,Yomg'ir,Yaxshi,Normal,True,1


In [100]:
data_yomgir[data_yomgir["Temp"]=="Salqin"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
8,Yomg'ir,Salqin,Normal,False,0


In [101]:
data_yomgir[data_yomgir["Temp"]=="Issiq"]

Unnamed: 0,Ob-Havo,Temp,Namlik,Shamol,Oyinchilar
0,Yomg'ir,Issiq,Yuqori,False,0
1,Yomg'ir,Issiq,Yuqori,True,0


In [102]:
temp_entropy_yomgir = (-(1/2)*np.log2(1/2)-(1/2)*np.log2(1/2)) * 2/5

In [103]:
temp_entropy_yomgir

0.4

### `Information gain Yomg'ir`
* Bunda bitta temperaturaning qiymati bor

In [104]:
entropy_label - temp_entropy_yomgir

0.5852281360342515

# Xulosa. Decision Tree Regressor doimiy ravishda biror qiymat qaytaradi. Modelimiz biror son qiymat oralig'ida javob qaytaradi (misol: agar 40-50 oraliqda bo'lsa har doim shu oraliqda javob qaytaradi)
# Decision Tree Classification nomi bilan sinflash ya'ni ha/yo'q qiymatlarni qaytaradi.