### keras를 활용한 MNIST model based CNN
keras로 신경망 구성하는 순서  
1) Sequential 클래스 -> 객체 생성  
2) add 메서드 -> 층(레이어)을 추가  
  - 입력부터 순서대로 추가
  - shape 체크
  - activation 활성화 설정  
  - 2개 변수 -> 4개입, 10개출 -> 10개입, 20개출 -> 20개입
  
  
3) compile 메서드 -> 모델 생성
  - loss, optimizer (알고리즘), 성능기준설정
  
  
4) fit 메서드 -> 트레이닝
  - 에폭 설정(6만개 data가 있다면 1에폭=6만개 training)
  - 배치설정(분할 모델링, 큐러너 등 사용)

In [40]:
from keras.datasets import mnist

In [41]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images.shape
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

In [42]:
from keras import *

In [43]:
network=models.Sequential() # 하나의 신경망
# 계층(layers.Dense(출력개수, 활성화함수, 입력)) 1개 추가 (입력(input_shape), 활성화함수(activation)를 명시해줘함)
network.add(layers.Dense(512, activation='relu', input_shape=(28*28,)))
# 출력일 경우 층이 분류기가 10개이므로 노드를 10개로 써주기
network.add(layers.Dense(10, activation='softmax'))

In [44]:
# 훈련(컴파일)
# optimizer, cost 함수, 성능 기준을 어떤걸 쓸지 명시해줘야함
network.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

#### 최적화 : cost를 최소화하는 방향으로 파라미터를 업데이트 하는 것
1) 경사하강법(SGD) : 기울기 * 학습률(learning rate) => 가중치 갱신
  - 대부분 기울어진 방향으로 하강하므로 모델이 복잡할때는 성능이 떨어짐  
  
2) 모멘텀 : 운동량, 속도 크면 -> 기울기 크게 업데이트 

3) AdaGrad(RMSProp:새로운 기울기만 학습률에 반영)

4) Adam(모멘텀+AdaGrad)  
=> 주로 Adam의 성능이 좋은 편

In [45]:
train_images.shape
train_images=train_images.reshape((60000,28*28))
train_images=train_images.astype('float32')/255 # 정규화 작업(0~255)까지 들어가므로 255로 나눠주기

test_images.shape
test_images=test_images.reshape((10000,28*28))
test_images=test_images.astype('float32')/255 # 정규화 작업

In [46]:
# train,test labels를 원핫인코딩로 변환해주기 
from keras.utils import to_categorical

In [47]:
train_labels=to_categorical(train_labels)
test_labels=to_categorical(test_labels)
network.fit(train_images, train_labels, epochs=5, batch_size=128)
# epochs, batch_size는 적절하게 조정해서 쓸 수 있음

W0807 11:11:16.133645  8664 deprecation.py:323] From C:\Users\user\Anaconda3\lib\site-packages\tensorflow\python\ops\math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0807 11:11:16.185479  8664 deprecation_wrapper.py:119] From C:\Users\user\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1dac5c129b0>

In [49]:
test_cost, test_acc=network.evaluate(test_images,test_labels)



In [50]:
print(test_acc)

0.9744


### 자연어 처리

In [2]:
import nltk
nltk.download() 

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


True

In [51]:
# pip install konlpy
# from konlpy.tag import Twitter

In [10]:
# 영문 
# 한글:konlpy
from nltk.tokenize import word_tokenize # 단어 토큰 단위로 나눠주는 함수
print(word_tokenize("How are you?"))
print(word_tokenize("Don't, touch, me"))

from nltk.tokenize import WordPunctTokenizer
print(WordPunctTokenizer().tokenize("Don't, touch, me"))



['How', 'are', 'you', '?']
['Do', "n't", ',', 'touch', ',', 'me']
['Don', "'", 't', ',', 'touch', ',', 'me']


In [4]:
from konlpy.tag import Okt
okt=Okt()
okt.pos("아버지 가방에 들어가신다")

-------------------------------------------------------------------------------
Deprecated: convertStrings was not specified when starting the JVM. The default
behavior in JPype will be False starting in JPype 0.8. The recommended setting
for new code is convertStrings=False.  The legacy value of True was assumed for
please file a ticket with the developer.
-------------------------------------------------------------------------------

  """)


[('아버지', 'Noun'), ('가방', 'Noun'), ('에', 'Josa'), ('들어가신다', 'Verb')]

In [13]:
from konlpy.tag import Hannanum

In [14]:
hannanum=Hannanum()
print(hannanum.analyze(u'롯데마트의 흑마늘 양념 치킨이 논란이 되고 있다.'))

[[[('롯데마트', 'ncn'), ('의', 'jcm')], [('롯데마트의', 'ncn')], [('롯데마트', 'nqq'), ('의', 'jcm')], [('롯데마트의', 'nqq')]], [[('흑마늘', 'ncn')], [('흑마늘', 'nqq')]], [[('양념', 'ncn')]], [[('치킨', 'ncn'), ('이', 'jcc')], [('치킨', 'ncn'), ('이', 'jcs')], [('치킨', 'ncn'), ('이', 'ncn')]], [[('논란', 'ncpa'), ('이', 'jcc')], [('논란', 'ncpa'), ('이', 'jcs')], [('논란', 'ncpa'), ('이', 'ncn')]], [[('되', 'nbu'), ('고', 'jcj')], [('되', 'nbu'), ('이', 'jp'), ('고', 'ecc')], [('되', 'nbu'), ('이', 'jp'), ('고', 'ecs')], [('되', 'nbu'), ('이', 'jp'), ('고', 'ecx')], [('되', 'paa'), ('고', 'ecc')], [('되', 'paa'), ('고', 'ecs')], [('되', 'paa'), ('고', 'ecx')], [('되', 'pvg'), ('고', 'ecc')], [('되', 'pvg'), ('고', 'ecs')], [('되', 'pvg'), ('고', 'ecx')], [('되', 'px'), ('고', 'ecc')], [('되', 'px'), ('고', 'ecs')], [('되', 'px'), ('고', 'ecx')]], [[('있', 'paa'), ('다', 'ef')], [('있', 'px'), ('다', 'ef')]], [[('.', 'sf')], [('.', 'sy')]]]


In [23]:
from nltk.tokenize import *
from nltk.tag import *
text="Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[27] Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is often described as a batteries included language due to its comprehensive standard library.[28]"
print(sent_tokenize(text))
print("="*50)
print(word_tokenize(text))
print("="*50)
print(pos_tag(word_tokenize(text)))

['Python is an interpreted, high-level, general-purpose programming language.', "Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace.", 'Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.', '[27] Python is dynamically typed and garbage-collected.', 'It supports multiple programming paradigms, including procedural, object-oriented, and functional programming.', 'Python is often described as a batteries included language due to its comprehensive standard library.', '[28]']
['Python', 'is', 'an', 'interpreted', ',', 'high-level', ',', 'general-purpose', 'programming', 'language', '.', 'Created', 'by', 'Guido', 'van', 'Rossum', 'and', 'first', 'released', 'in', '1991', ',', 'Python', "'s", 'design', 'philosophy', 'emphasizes', 'code', 'readability', 'with', 'its', 'notable', 'use', 'of', 'significan

In [32]:
from konlpy.tag import Kkma
kma=Kkma()

In [34]:
print(okt.morphs("오늘도 지각하지 않고 열심히 공부한 여러분, 이번주도 힘냅시다."))
print("="*30)
print(kma.morphs("오늘도 지각하지 않고 열심히 공부한 여러분, 이번주도 힘냅시다."))
print("="*30)
print(okt.pos("오늘도 지각하지 않고 열심히 공부한 여러분, 이번주도 힘냅시다."))
print("="*30)
print(okt.nouns("오늘도 지각하지 않고 열심히 공부한 여러분, 이번주도 힘냅시다."))
print("="*30)

['오늘', '도', '지각', '하지', '않고', '열심히', '공부', '한', '여러분', ',', '이번', '주도', '힘냅시다', '.']
['오늘', '도', '지각', '하', '지', '않', '고', '열심히', '공부', '하', 'ㄴ', '여러분', ',', '이번', '주도', '힘내', 'ㅂ시다', '.']
[('오늘', 'Noun'), ('도', 'Josa'), ('지각', 'Noun'), ('하지', 'Verb'), ('않고', 'Verb'), ('열심히', 'Adverb'), ('공부', 'Noun'), ('한', 'Josa'), ('여러분', 'Noun'), (',', 'Punctuation'), ('이번', 'Noun'), ('주도', 'Noun'), ('힘냅시다', 'Verb'), ('.', 'Punctuation')]
['오늘', '지각', '공부', '여러분', '이번', '주도']


#### 자연어처리 전처리
- 대소문자 통합, 불용어 제거(대체), 특수문자 처리(공백,문장보호)
- train, training, trains,... => train
- are/is/was/were => be
- 정규표현식
- 형태소 : 어간(stem) + 접사, 말뜻의 최소단위

In [35]:
from nltk.stem import WordNetLemmatizer

In [38]:
# 단어 통일할 때 쓰면 편한 함수
wnl=WordNetLemmatizer()
print(wnl.lemmatize("has","v"))
print(wnl.lemmatize("were","v"))
print(wnl.lemmatize("was","v"))

have
be
be


In [63]:
text="Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[27] Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is often described as a batteries included language due to its comprehensive standard library.[28]"

In [43]:
#포터알고리즘(어간추출)
from nltk.stem import PorterStemmer
ps=PorterStemmer()
words=word_tokenize(text)
print(words)
print("="*30)

# 어간 추출
ps_words=[ps.stem(w) for w in words]
print(ps_words)

['Python', 'is', 'an', 'interpreted', ',', 'high-level', ',', 'general-purpose', 'programming', 'language', '.', 'Created', 'by', 'Guido', 'van', 'Rossum', 'and', 'first', 'released', 'in', '1991', ',', 'Python', "'s", 'design', 'philosophy', 'emphasizes', 'code', 'readability', 'with', 'its', 'notable', 'use', 'of', 'significant', 'whitespace', '.', 'Its', 'language', 'constructs', 'and', 'object-oriented', 'approach', 'aim', 'to', 'help', 'programmers', 'write', 'clear', ',', 'logical', 'code', 'for', 'small', 'and', 'large-scale', 'projects', '.', '[', '27', ']', 'Python', 'is', 'dynamically', 'typed', 'and', 'garbage-collected', '.', 'It', 'supports', 'multiple', 'programming', 'paradigms', ',', 'including', 'procedural', ',', 'object-oriented', ',', 'and', 'functional', 'programming', '.', 'Python', 'is', 'often', 'described', 'as', 'a', 'batteries', 'included', 'language', 'due', 'to', 'its', 'comprehensive', 'standard', 'library', '.', '[', '28', ']']
['python', 'is', 'an', 'int

In [48]:
# 불용어로 등록된 단어들 
from nltk.corpus import stopwords
len(stopwords.words('english')) 
sw=stopwords.words('english')
test="i need you to help me. i like coding. what's your hobby."
test=word_tokenize(test) # 단어로 나뉘어서 리스트로 저장
test
res=[]
for w in test :
    if w not in sw:
        res.append(w)
print(res)

['need', 'help', '.', 'like', 'coding', '.', "'s", 'hobby', '.']


In [54]:
sw="열심히 하기싫어 싫거든 안해"
test="파이썬 코딩을 열심히 해야 합니다. 하기싫어도 해요. 그래도 싫거든 하세요 안해 안해 안해"
sw=sw.split(" ")
sw #불용어
test=word_tokenize(test)
res2=[]
for w in test :
    if w not in sw:
        res2.append(w)
print(res2)

['파이썬', '코딩을', '해야', '합니다', '.', '하기싫어도', '해요', '.', '그래도', '하세요']


In [75]:
# 정수 인코딩
text="Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[27] Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is often described as a batteries included language due to its comprehensive standard library.[28]"
text
# 각 단어에 대한 등장횟수 출력
# 단어의 길이가 2이하인 경우 제외
# 불용어 사전 단어 제거 
# 대소문자 구분 없음 (모두 소문자로 변환)

"Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[27] Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is often described as a batteries included language due to its comprehensive standard library.[28]"

In [83]:
def word_encode(text):
    line = text.lower()
    word = word_tokenize(line)
    sw=stopwords.words('english')
    res=[]
    counts={}
    for w in word:
        if w not in sw:
            res.append(w)
            for word in res:
                if word in counts:
                    counts[word] += 1 
                else:
                    counts[word]=1
    return counts
    

In [84]:
text=sent_tokenize(text)
word_encode(text)

{'python': 189,
 'interpreted': 79,
 ',': 321,
 'high-level': 77,
 'general-purpose': 75,
 'programming': 115,
 'language': 133,
 '.': 207,
 'created': 71,
 'guido': 70,
 'van': 69,
 'rossum': 68,
 'first': 67,
 'released': 66,
 '1991': 65,
 "'s": 62,
 'design': 61,
 'philosophy': 60,
 'emphasizes': 59,
 'code': 98,
 'readability': 57,
 'notable': 56,
 'use': 55,
 'significant': 54,
 'whitespace': 53,
 'constructs': 50,
 'object-oriented': 68,
 'approach': 48,
 'aim': 47,
 'help': 46,
 'programmers': 45,
 'write': 44,
 'clear': 43,
 'logical': 41,
 'small': 39,
 'large-scale': 38,
 'projects': 37,
 '[': 38,
 '27': 34,
 ']': 34,
 'dynamically': 31,
 'typed': 30,
 'garbage-collected': 29,
 'supports': 27,
 'multiple': 26,
 'paradigms': 24,
 'including': 22,
 'procedural': 21,
 'functional': 17,
 'often': 13,
 'described': 12,
 'batteries': 11,
 'included': 10,
 'due': 8,
 'comprehensive': 7,
 'standard': 6,
 'library': 5,
 '28': 2}

In [92]:
text="Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.[27] Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python is often described as a batteries included language due to its comprehensive standard library.[28]"
text=sent_tokenize(text)

In [96]:
from nltk.tokenize import word_tokenize 
from nltk.corpus import stopwords
from collections import Counter
voc=Counter()
sentences=[]
stop_words=stopwords.words('english')
for i in text:
    sentence=word_tokenize(i)
    res=[]
    for word in sentence:
        word=word.lower()
        if word not in stop_words:
            if len(word) > 2:
                res.append(word)
                voc[word]=voc[word]+1
    sentences.append(res)
print(sentences)

[['python', 'interpreted', 'high-level', 'general-purpose', 'programming', 'language'], ['created', 'guido', 'van', 'rossum', 'first', 'released', '1991', 'python', 'design', 'philosophy', 'emphasizes', 'code', 'readability', 'notable', 'use', 'significant', 'whitespace'], ['language', 'constructs', 'object-oriented', 'approach', 'aim', 'help', 'programmers', 'write', 'clear', 'logical', 'code', 'small', 'large-scale', 'projects'], ['python', 'dynamically', 'typed', 'garbage-collected'], ['supports', 'multiple', 'programming', 'paradigms', 'including', 'procedural', 'object-oriented', 'functional', 'programming'], ['python', 'often', 'described', 'batteries', 'included', 'language', 'due', 'comprehensive', 'standard', 'library'], []]


In [104]:
import folium
# myMap=folium.Map(location=[36.123,127.123],zoom_start=13,tiles='Stamen Terrain')
myMap=folium.Map(location=[36.123,127.123],zoom_start=13)
folium.Marker([36.123,127.123],popup="우리집").add_to(myMap)
myMap