# Regresja liniowa
### Parametry:
* **abilities**:
    * **cha**
    * **con**
    * **dex**
    * **int**
    * **str**
    * **wis**
* **attributes**
    * **hp**
    * **ac**

### Przewidujemy: *level*

In [9]:
import numpy as np
import pandas as pd

In [10]:
bestiary1 = pd.read_json("pathfinder_2e_data/pathfinder-bestiary.db", lines=True)
bestiary2 = pd.read_json("pathfinder_2e_data/pathfinder-bestiary-2.db", lines=True)
bestiary3 = pd.read_json("pathfinder_2e_data/pathfinder-bestiary-3.db", lines=True)

In [11]:
b = [bestiary1, bestiary2, bestiary3]

bestiary = pd.concat(b, join='outer', axis=0).fillna(np.nan)

In [12]:
print((bestiary1.size + bestiary2.size + bestiary3.size) == bestiary.size)

True


In [13]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   _id             1206 non-null   object
 1   img             1206 non-null   object
 2   items           1206 non-null   object
 3   name            1206 non-null   object
 4   system          1206 non-null   object
 5   type            1206 non-null   object
 6   flags           1206 non-null   object
 7   prototypeToken  85 non-null     object
dtypes: object(8)
memory usage: 84.8+ KB


Po połączeniu sprawdzamy czy wszystkie z rekordów to potworki - *npc*, a nie jakieś przedmioty

In [25]:
bestiary.type.unique()

array(['npc'], dtype=object)

W bestiariuszach nie ma nic oprócz bestii, więc nic nie usuwamy.

In [None]:
from copy import deepcopy

In [None]:
bestiary_backup = deepcopy(bestiary)

```unpack_column``` - Funkcja która rozpakowuje słownik i zmienia go w dataframe. Printuje pierwszą wartość tabeli i długość

*by P J*

In [19]:
def unpack_column(df, column_name):
    new_df = df[column_name].apply(pd.Series)
    return new_df

## Wyodrębnianie kolumny *system* i przygotowanie danych
Wszystkie potzrebne wartości znajdyją się w tej kolumnie

In [20]:
bestiary_system = unpack_column(bestiary, "system")

### Przygotowanie danych z system: *abilities*
* cha
* con
* dex
* int
* str
* wis


In [22]:
abilities = unpack_column(bestiary_system, 'abilities')

for col in abilities.columns:
    for i, row in abilities.iterrows():
        row[col]=row[col].get('mod')

In [23]:
abilities.head()

Unnamed: 0,cha,con,dex,int,str,wis
0,1,5,2,1,7,2
1,1,5,0,-4,9,2
2,-2,6,3,-4,7,3
3,6,5,4,6,7,5
4,1,1,3,-1,-5,1


In [24]:
abilities.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   cha     1206 non-null   object
 1   con     1206 non-null   object
 2   dex     1206 non-null   object
 3   int     1206 non-null   object
 4   str     1206 non-null   object
 5   wis     1206 non-null   object
dtypes: object(6)
memory usage: 66.0+ KB


### Przygotowanie danych z attributes

In [67]:
attributes = unpack_column(bestiary_system, 'attributes')

#### Przygotowanie attributes: *ac*
W modelu wykorzystamy wastość value z ac

In [68]:
attributes_ac = unpack_column(attributes, "ac")

In [69]:
attributes_ac = pd.DataFrame(data=attributes_ac.value)
attributes_ac.columns = ["ac"]

#### Przygotowanie attributes: *hp*
W modelu wykorzystamy wastość value z hp i jak na razie nie będziemy uwzględniać pozostałych

In [70]:
attributes_hp = unpack_column(attributes, "hp")

In [71]:
attributes_hp = pd.DataFrame(data=attributes_hp.value)
attributes_hp.columns = ["hp"]

## Usunięcie niepotrzebnych wartości z oryginalnej tabeli i dodanie przygotowanych powyżej danych

In [76]:
bestiary = bestiary.join(abilities)

In [77]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3412 entries, 0 to 467
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   _id             3412 non-null   object
 1   img             3412 non-null   object
 2   items           3412 non-null   object
 3   name            3412 non-null   object
 4   system          3412 non-null   object
 5   type            3412 non-null   object
 6   flags           3412 non-null   object
 7   prototypeToken  235 non-null    object
 8   cha             3412 non-null   object
 9   con             3412 non-null   object
 10  dex             3412 non-null   object
 11  int             3412 non-null   object
 12  str             3412 non-null   object
 13  wis             3412 non-null   object
dtypes: object(14)
memory usage: 399.8+ KB


In [78]:
bestiary = bestiary.join(attributes_ac)

In [79]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10014 entries, 0 to 467
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   _id             10014 non-null  object
 1   img             10014 non-null  object
 2   items           10014 non-null  object
 3   name            10014 non-null  object
 4   system          10014 non-null  object
 5   type            10014 non-null  object
 6   flags           10014 non-null  object
 7   prototypeToken  683 non-null    object
 8   cha             10014 non-null  object
 9   con             10014 non-null  object
 10  dex             10014 non-null  object
 11  int             10014 non-null  object
 12  str             10014 non-null  object
 13  wis             10014 non-null  object
 14  ac              10014 non-null  int64 
dtypes: int64(1), object(14)
memory usage: 1.2+ MB


In [81]:
bestiary = bestiary.join(attributes_hp)

In [82]:
bestiary.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29788 entries, 0 to 467
Data columns (total 16 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   _id             29788 non-null  object
 1   img             29788 non-null  object
 2   items           29788 non-null  object
 3   name            29788 non-null  object
 4   system          29788 non-null  object
 5   type            29788 non-null  object
 6   flags           29788 non-null  object
 7   prototypeToken  2023 non-null   object
 8   cha             29788 non-null  object
 9   con             29788 non-null  object
 10  dex             29788 non-null  object
 11  int             29788 non-null  object
 12  str             29788 non-null  object
 13  wis             29788 non-null  object
 14  ac              29788 non-null  int64 
 15  hp              29788 non-null  int64 
dtypes: int64(2), object(14)
memory usage: 3.9+ MB


In [None]:
bestiary.drop("prototypeToken", axis=1, inplace=True)
bestiary.drop("flags", axis=1, inplace=True)
bestiary.drop("type", axis=1, inplace=True)
bestiary.drop("system", axis=1, inplace=True)
bestiary.drop("name", axis=1, inplace=True)
bestiary.drop("items", axis=1, inplace=True)
bestiary.drop("img", axis=1, inplace=True)
bestiary.drop("_id", axis=1, inplace=True)

## Wyobrębnienie wartości przewidywanej *level*

In [83]:
level = unpack_column(unpack_column(bestiary_system, "details"), "level")

In [None]:
level.columns = ["level"]

In [87]:
level.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1206 entries, 0 to 364
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   level   1206 non-null   int64
dtypes: int64(1)
memory usage: 51.1 KB


In [88]:
level.head()

Unnamed: 0,level
0,10
1,10
2,8
3,17
4,1
