# Regresja liniowa
### Parametry:
* **abilities**:
    * **cha**
    * **con**
    * **dex**
    * **int**
    * **str**
    * **wis**
* **attributes**
    * **hp**
    * **ac**

### Przewidujemy: *level*

In [1]:
import numpy as np
import pandas as pd

In [2]:
bestiary1 = pd.read_json("pathfinder_2e_data/pathfinder-bestiary.db", lines=True)
bestiary2 = pd.read_json("pathfinder_2e_data/pathfinder-bestiary-2.db", lines=True)
bestiary3 = pd.read_json("pathfinder_2e_data/pathfinder-bestiary-3.db", lines=True)

In [3]:
b = [bestiary1, bestiary2, bestiary3]

bestiary = pd.concat(b, join='outer', axis=0).fillna(np.nan)

In [4]:
print((bestiary1.size + bestiary2.size + bestiary3.size) == bestiary.size) # sprawdzenie czy coś nie uciekło

True


In [5]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   _id             1206 non-null   object
 1   img             1206 non-null   object
 2   items           1206 non-null   object
 3   name            1206 non-null   object
 4   system          1206 non-null   object
 5   type            1206 non-null   object
 6   flags           1206 non-null   object
 7   prototypeToken  85 non-null     object
dtypes: object(8)
memory usage: 84.8+ KB


Po połączeniu sprawdzamy czy wszystkie z rekordów to potworki - *npc*, a nie jakieś przedmioty

In [6]:
bestiary.type.unique()

array(['npc'], dtype=object)

W bestiariuszach nie ma nic oprócz bestii, więc nic nie usuwamy.

In [7]:
from copy import deepcopy

In [8]:
bestiary_backup = deepcopy(bestiary)

```unpack_column``` - Funkcja która rozpakowuje słownik i zmienia go w dataframe. Printuje pierwszą wartość tabeli i długość

*by P J*

In [9]:
def unpack_column(df, column_name):
    new_df = df[column_name].apply(pd.Series)
    return new_df

## Wyodrębnianie kolumny *system* i przygotowanie danych
Wszystkie potzrebne wartości znajdyją się w tej kolumnie

In [10]:
bestiary_system = unpack_column(bestiary, "system")

### Przygotowanie danych z system: *abilities*
* cha
* con
* dex
* int
* str
* wis


In [11]:
abilities = unpack_column(bestiary_system, 'abilities')

for col in abilities.columns:
    for i, row in abilities.iterrows():
        row[col]=row[col].get('mod')

In [12]:
abilities.head()

Unnamed: 0,cha,con,dex,int,str,wis
0,1,5,2,1,7,2
1,1,5,0,-4,9,2
2,-2,6,3,-4,7,3
3,6,5,4,6,7,5
4,1,1,3,-1,-5,1


In [13]:
abilities.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   cha     1206 non-null   object
 1   con     1206 non-null   object
 2   dex     1206 non-null   object
 3   int     1206 non-null   object
 4   str     1206 non-null   object
 5   wis     1206 non-null   object
dtypes: object(6)
memory usage: 66.0+ KB


### Przygotowanie danych z attributes

In [14]:
attributes = unpack_column(bestiary_system, 'attributes')

#### Przygotowanie attributes: *ac*
W modelu wykorzystamy wastość value z ac

In [15]:
attributes_ac = unpack_column(attributes, "ac")

In [16]:
attributes_ac = pd.DataFrame(data=attributes_ac.value)
attributes_ac.columns = ["ac"]

In [17]:
attributes_ac.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   ac      1206 non-null   int64
dtypes: int64(1)
memory usage: 18.8 KB


In [18]:
attributes_ac.head()

Unnamed: 0,ac
0,29
1,28
2,25
3,41
4,16


#### Przygotowanie attributes: *hp*
W modelu wykorzystamy wastość value z hp i jak na razie nie będziemy uwzględniać pozostałych

In [19]:
attributes_hp = unpack_column(attributes, "hp")

In [20]:
attributes_hp = pd.DataFrame(data=attributes_hp.value)
attributes_hp.columns = ["hp"]

In [21]:
attributes_hp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   hp      1206 non-null   int64
dtypes: int64(1)
memory usage: 18.8 KB


In [22]:
attributes_hp.head()

Unnamed: 0,hp
0,215
1,220
2,175
3,315
4,20


## Usunięcie niepotrzebnych wartości z oryginalnej tabeli i dodanie przygotowanych powyżej danych

In [23]:
# bestiary_copy = deepcopy(bestiary)

In [24]:
# bestiary = deepcopy(bestiary_copy)

In [25]:
# bestiary = deepcopy(bestiary_backup)

In [26]:
for col in abilities:
    bestiary[col] = abilities[col]

In [27]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   _id             1206 non-null   object
 1   img             1206 non-null   object
 2   items           1206 non-null   object
 3   name            1206 non-null   object
 4   system          1206 non-null   object
 5   type            1206 non-null   object
 6   flags           1206 non-null   object
 7   prototypeToken  85 non-null     object
 8   cha             1206 non-null   object
 9   con             1206 non-null   object
 10  dex             1206 non-null   object
 11  int             1206 non-null   object
 12  str             1206 non-null   object
 13  wis             1206 non-null   object
dtypes: object(14)
memory usage: 141.3+ KB


In [28]:
bestiary["ac"] = attributes_ac

In [29]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   _id             1206 non-null   object
 1   img             1206 non-null   object
 2   items           1206 non-null   object
 3   name            1206 non-null   object
 4   system          1206 non-null   object
 5   type            1206 non-null   object
 6   flags           1206 non-null   object
 7   prototypeToken  85 non-null     object
 8   cha             1206 non-null   object
 9   con             1206 non-null   object
 10  dex             1206 non-null   object
 11  int             1206 non-null   object
 12  str             1206 non-null   object
 13  wis             1206 non-null   object
 14  ac              1206 non-null   int64 
dtypes: int64(1), object(14)
memory usage: 150.8+ KB


In [30]:
bestiary["hp"] = attributes_hp

In [31]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 16 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   _id             1206 non-null   object
 1   img             1206 non-null   object
 2   items           1206 non-null   object
 3   name            1206 non-null   object
 4   system          1206 non-null   object
 5   type            1206 non-null   object
 6   flags           1206 non-null   object
 7   prototypeToken  85 non-null     object
 8   cha             1206 non-null   object
 9   con             1206 non-null   object
 10  dex             1206 non-null   object
 11  int             1206 non-null   object
 12  str             1206 non-null   object
 13  wis             1206 non-null   object
 14  ac              1206 non-null   int64 
 15  hp              1206 non-null   int64 
dtypes: int64(2), object(14)
memory usage: 160.2+ KB


In [32]:
bestiary.drop("prototypeToken", axis=1, inplace=True)
bestiary.drop("flags", axis=1, inplace=True)
bestiary.drop("type", axis=1, inplace=True)
bestiary.drop("system", axis=1, inplace=True)
bestiary.drop("name", axis=1, inplace=True)
bestiary.drop("items", axis=1, inplace=True)
bestiary.drop("img", axis=1, inplace=True)
bestiary.drop("_id", axis=1, inplace=True)

In [33]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   cha     1206 non-null   object
 1   con     1206 non-null   object
 2   dex     1206 non-null   object
 3   int     1206 non-null   object
 4   str     1206 non-null   object
 5   wis     1206 non-null   object
 6   ac      1206 non-null   int64 
 7   hp      1206 non-null   int64 
dtypes: int64(2), object(6)
memory usage: 84.8+ KB


In [34]:
bestiary.head()

Unnamed: 0,cha,con,dex,int,str,wis,ac,hp
0,1,5,2,1,7,2,29,215
1,1,5,0,-4,9,2,28,220
2,-2,6,3,-4,7,3,25,175
3,6,5,4,6,7,5,41,315
4,1,1,3,-1,-5,1,16,20


## Wyobrębnienie wartości przewidywanej *level*

In [35]:
level = unpack_column(unpack_column(bestiary_system, "details"), "level")

In [36]:
level.columns = ["level"]

In [37]:
level.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   level   1206 non-null   int64
dtypes: int64(1)
memory usage: 18.8 KB


In [38]:
level.head()

Unnamed: 0,level
0,10
1,10
2,8
3,17
4,1


## Podział na zbiór treningowy i testowy

In [39]:
x = deepcopy(bestiary)
y = deepcopy(level)

In [40]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.25, random_state=0, shuffle=True
)

## Regresja liniowa

In [41]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(x_train, y_train)
predict = model.predict(x_test)

## Sprawdzenie modelu

In [42]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

r2 = r2_score(y_test, predict)
MSE = mean_squared_error(y_test, predict)
print("R2 score:", round(r2, 2))
print("Mean square error:", round(MSE, 2))

R2 score: 0.98
Mean square error: 0.59


* R2 - blisko 1, więc spoko
* trochę gorzej z mse byłoby fajnie jakby to była wartość bliżej 0, ale możliwe, że jak się zastosuje jakieś skalowanie
cech czy inną magię to będzie lepiej

In [43]:
from sklearn.metrics import mean_absolute_error, median_absolute_error, explained_variance_score

print("Mean absolute error=", round(mean_absolute_error(y_test, predict), 2))
print("Median absolute error", round(median_absolute_error(y_test, predict), 2))
print("Explain variance score =", round(explained_variance_score(y_test, predict), 2))

Mean absolute error= 0.53
Median absolute error 0.33
Explain variance score = 0.98


In [44]:
y_test

Unnamed: 0,level
395,4
5,7
141,13
452,8
52,10
...,...
77,13
11,0
155,13
240,13


In [46]:
predict

array([[ 3.97071151],
       [ 7.55108636],
       [12.65227177],
       [ 8.3011707 ],
       [10.55697225],
       [ 2.42171933],
       [12.8454115 ],
       [ 0.56995194],
       [ 2.48233849],
       [ 2.41320599],
       [ 4.31566236],
       [ 2.58998839],
       [ 5.18886725],
       [ 7.78325486],
       [ 0.76054551],
       [ 2.18391812],
       [ 7.6916723 ],
       [ 7.60719381],
       [ 1.29629252],
       [ 3.91429822],
       [10.41437684],
       [ 6.0752099 ],
       [ 2.87468923],
       [ 2.71822916],
       [ 0.89218769],
       [ 1.28255094],
       [ 3.94032307],
       [ 3.11332108],
       [ 9.23999991],
       [ 0.54276186],
       [ 2.2350029 ],
       [ 6.51554244],
       [ 3.11332108],
       [ 3.13476207],
       [ 8.79578503],
       [ 2.72362384],
       [10.38392818],
       [ 0.89218769],
       [ 5.79535245],
       [ 1.32809702],
       [ 5.78546017],
       [12.02439583],
       [14.71501779],
       [10.65396054],
       [ 1.84903057],
       [ 5

In [45]:
print(min(y_test.level))
print(max(y_test.level))

-1
21
