-
Notifications
You must be signed in to change notification settings - Fork 1
/
reticulate.qmd
136 lines (102 loc) · 3.21 KB
/
reticulate.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: "chatGPT"
subtitle: "파이썬 환경구축"
author:
- name: 이광춘
url: https://www.linkedin.com/in/kwangchunlee/
affiliation: 한국 R 사용자회
affiliation-url: https://github.com/bit2r
title-block-banner: true
#title-block-banner: "#562457"
format:
html:
css: css/quarto.css
theme: flatly
code-fold: true
toc: true
toc-depth: 3
toc-title: 목차
number-sections: true
highlight-style: github
self-contained: false
filters:
- lightbox
- interview-callout.lua
lightbox: auto
link-citations: yes
knitr:
opts_chunk:
message: false
warning: false
collapse: true
comment: "#>"
R.options:
knitr.graphics.auto_pdf: true
editor_options:
chunk_output_type: console
---
![](images/python_with_R.png)
# 파이썬 환경 설정
`reticulate` 패키지로 콘다 파이썬 환경을 구축한다.
필요한 경우 패키지도 설치한다.
[[Riddhiman (Apr 19, 2022), 'Getting started with Python using R and reticulate'](https://rtichoke.netlify.app/post/getting_started_with_reticulate/)]{.aside}
```{r}
# install.packages("reticulate")
library(reticulate)
# conda_list()
use_condaenv(condaenv = "r-reticulate")
# py_install(packages = c("pandas", "scikit-learn"))
```
# 데이터 가져오기
[펭귄 데이터](https://raw.githubusercontent.com/dataprofessor/data/master/penguins_cleaned.csv)를 다운로드 받아 로컬 컴퓨터 `data` 폴더에 저장시킨다.
```{r}
library(tidyverse)
fs::dir_create("data")
download.file(url = "https://raw.githubusercontent.com/dataprofessor/data/master/penguins_cleaned.csv", destfile = "data/penguins_cleaned.csv")
penguin_df <- readr::read_csv("data/penguins_cleaned.csv")
penguin_df
```
# 파이썬 기계학습 모형
파이썬 sklearn 패키지로 펭귄 성별예측 모형을 구축하자.
:::{.panel-tabset}
## 파이썬 코드
```python
# "code/penguin_sex_clf.py"
import pandas as pd
penguins = pd.read_csv('data/penguins_cleaned.csv')
penguins_df = penguins[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']]
# Ordinal feature encoding
# https://www.kaggle.com/pratik1120/penguin-dataset-eda-classification-and-clustering
df = penguins_df.copy()
target_mapper = {'male':0, 'female':1}
def target_encode(val):
return target_mapper[val]
df['sex'] = df['sex'].apply(target_encode)
# Separating X and Y
X = df.drop('sex', axis=1)
Y = df['sex']
# Build random forest model
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X, Y)
```
## R 환경 불러오기
```{r}
source_python("code/penguin_sex_clf.py")
clf
```
:::
# 시각화
파이썬 기계학습 결과를 R로 가져와서 변수 중요도를 시각화한다.
```{r}
feat_tbl <- tibble(features = clf$feature_names_in_,
importance = clf$feature_importances_)
feat_tbl %>%
ggplot(aes(x = fct_reorder(features, importance), y = importance)) +
geom_point(size = 3) +
geom_segment( aes(x=features, xend=features, y=0, yend=importance)) +
labs(y = "Feature 중요도", x = "Feature",
title = "펭귄 암수 예측모형 Feature 중요도") +
coord_flip() +
theme_bw(base_family = "AppleGothic")
```