## 파이썬 데이터 타입

 * 숫자형
     * 파이썬3 기준 : 정수형(int) 과 실수형 (float)
     * 파이썬2 기준 : 정수형(int, long) 과 실수형 (float, double)
 * 불린형 (Boolean) : True/False
 * 문자열
     * 파이썬3 기준 : str, bytes
     * 파이썬2 기준 : unicode, str

## 파이썬 기본 자료구조

 * 리스트 (list) : 순서를 보장하며, 수정이 가능
 * 튜플 (tuple) : 순서를 보장하며, 수정이 불가
 * 집합 (set) : 순서를 보장하지 않으며, 중복을 자동 제거
 * 사전 (dict) : 순서를 보장하지 않으며, Key/Value 를 저장

In [4]:
words = ["w1", "w2", "w3", "w4", "w5", "w1", "w2"] # list

In [3]:
words

['w1', 'w2', 'w3', 'w4', 'w5', 'w1', 'w2']

In [5]:
set(words)  # 집합 타입의 변환

{'w1', 'w2', 'w3', 'w4', 'w5'}

In [6]:
len(words), len(set(words))

(7, 5)

In [7]:
def lexical_diversity(text):
    return len(text) / len(set(text))

In [9]:
mytext = "hello world"
set(mytext)

{' ', 'd', 'e', 'h', 'l', 'o', 'r', 'w'}

In [10]:
lexical_diversity('hello world')

1.375

In [12]:
# 파이썬2 에서는 정수끼리 나누면 정수가 되고, 3 에서는 실수형 (float) 이 됩니다.
# python2
# 10 / 3   # 3 (int)
# python3
# 10 / 3  # 3.333... (float)

In [18]:
text7 = "hello hello hello oahi hel"

In [19]:
text7.count("hel")

4

In [20]:
text7.count("hello")

3

## Comprehension

### List Comprehension

In [22]:
mylist = []
for i in range(10):
    if i % 2 == 0:
        mylist.append(i**2)
print(mylist)

[0, 4, 16, 36, 64]


In [45]:
[i**2 for i in range(10) if i % 2 == 0]  # list comprehension 문법

[0, 4, 16, 36, 64]

In [46]:
[i for i in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [47]:
[ch.upper() for ch in "hello world"]

['H', 'E', 'L', 'L', 'O', ' ', 'W', 'O', 'R', 'L', 'D']

In [48]:
"hello world python django".split()

['hello', 'world', 'python', 'django']

In [49]:
[word.title() for word in "hello world python django".split()]

['Hello', 'World', 'Python', 'Django']

In [26]:
[ i % 10 for i in range(100) if i % 2 == 0 ]

[0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8,
 0,
 2,
 4,
 6,
 8]

### Set Comprehension

In [25]:
{ i % 10 for i in range(100) if i % 2 == 0 }

{0, 2, 4, 6, 8}

### Dict Comprehension

In [29]:
from nltk.book import FreqDist

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908


In [31]:
fdist1 = FreqDist('''A desperate America seeking distraction from an ugly political climate may have found a new sweetheart. He is Kenneth Bone, an undecided voter in a bright red sweater.

Whether that affection will last once he makes up his mind remains to be seen. In an interview with The New York Times, he revealed how he is now leaning.

Mr. Bone, 34, an operator at a coal plant in Illinois, was one of the undecided voters selected to ask a question at the town hall debate broadcast live on Sunday night.

He offered a contrast to the presidential candidates’ combative tone when he asked a straightforward policy query near the end of the 90-minute live broadcast.

“What steps will your energy policy take to meet our energy needs while at the same time remaining environmentally friendly and minimizing job layoffs?” he asked.

Judging by comments on social media, many of those who tuned in found Mr. Bone to be the most diverting thing about the debate. They were delighted with his sweater and images of him snapping pictures on a disposable camera shortly after the event.

Journalists and commentators flooded Twitter with memes, depicting Mr. Bone crossing the Delaware with George Washington, as a rapper or as the basis for the perfect Halloween costume. A YouTube song celebrated him (“Oh Kenneth Bone, you make us all feel less alone in this bizarro phantom zone in the darkest of timelines”), while others cautioned that he might wear out his welcome, like an election edition of Chewbacca Mom.

In a phone interview on Monday morning, Mr. Bone said that he had been leaning toward voting for Donald J. Trump, but that Hillary Clinton “really impressed me with her composure and some of her answers last night.”''')

In [32]:
fdist1.keys()

dict_keys(['k', '4', 'g', 'h', 'w', '?', 'A', 'm', 'W', 'f', 'j', 'e', '(', 'I', 'l', 'Y', 'N', '0', ',', 'c', 'v', 'b', '\n', 'q', 'r', ')', '“', 'T', 'a', 'd', '’', '9', 'G', 'u', 'o', 'C', '-', 'y', 'z', 'M', 'D', '.', '3', 'n', 's', 'i', 'p', 't', '”', 'H', 'J', 'B', 'S', ' ', 'K', 'O'])

In [36]:
tuple(fdist1.keys())[:50]

('k',
 '4',
 'g',
 'h',
 'w',
 '?',
 'A',
 'm',
 'W',
 'f',
 'j',
 'e',
 '(',
 'I',
 'l',
 'Y',
 'N',
 '0',
 ',',
 'c',
 'v',
 'b',
 '\n',
 'q',
 'r',
 ')',
 '“',
 'T',
 'a',
 'd',
 '’',
 '9',
 'G',
 'u',
 'o',
 'C',
 '-',
 'y',
 'z',
 'M',
 'D',
 '.',
 '3',
 'n',
 's',
 'i',
 'p',
 't',
 '”',
 'H')

In [37]:
FreqDist?

In [38]:
sent4 = ["Fellow", "-", "Citizens", "of", "the", "Senate", "and", "of", "the", "House", "of", "Representatives", ":"]

In [39]:
sent4[2]

'Citizens'

In [40]:
def revString(text):
    return text[::-1]

In [42]:
revString(sent4[2])

'snezitiC'

In [43]:
def revStringList(mylist):
    for text in mylist:
        print(text[::-1])

In [44]:
revStringList(["hello", "world", "python", "django"])

olleh
dlrow
nohtyp
ognajd


## 슬라이싱 (Slicing) 문법

 * mylist[시작인덱스:끝인덱스:인덱스증가량] : 시작인덱스 이상, 끝인덱스 미만
 * mylist[시작인덱스:] : 시작인덱스부터 끝까지
 * mylist[시작인덱스:끝인덱스] : 시작인덱스 이상, 끝인덱스 미만
 * mylist[:끝인덱스] : 처음부터 끝인덱스 미만까지
 * mylist[::인덱스증가량] : 처음부터 끝까지, 증가량만큼
      * 인덱스 증가량이 음수일 때는 : 시작을 끝에서부터 시작

### 문제3

In [50]:
def wordlength():
    # raw_input("Enter some text:") # python2
    line = input("Enter some text:") # python3
    result = []
    for word in line.split():
        result.append((word, len(word)))
    return result

In [51]:
wordlength()

Enter some text:time flies like an arrow


[('time', 4), ('flies', 5), ('like', 4), ('an', 2), ('arrow', 5)]

In [52]:
def wordlength2():
    line = input("Enter some text :")
    result = [(word, len(word)) for word in line.split()]
    return result

In [53]:
wordlength2()

Enter some text :time flies like an arrow


[('time', 4), ('flies', 5), ('like', 4), ('an', 2), ('arrow', 5)]