<a href="https://colab.research.google.com/github/ancestor9/2025_Spring_Data-Management/blob/main/week_07/Text_Representation_and_Embedding_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 텍스트 표현 기법과 임베딩
# **Data Representation**

## <font color='orange'>**Text data**
- **아래 그림을 이해하여야 한다.**
<img src='https://blog.kakaocdn.net/dn/dEG3BX/btsFAu7l2sf/PdKddROM1YXb5NRclcNFBk/img.png'>

## 🎯 강의 목표
- 자연어 처리에서 사용되는 텍스트 표현 방법의 역사와 원리를 이해한다.
- Bag of Words, TF-IDF, Word Embedding 기법을 실습한다.
- Word2Vec과 같은 사전 학습된 임베딩을 적용해 본다.

## 💻 **실습**

#### 2.0 정수인코딩(Integer Encoding)
- 📌 구현 도구: `Tokenizer`
- 예를 들어 : 단어에 정수를 부여하는 방법 중 하나로 단어를 빈도수 순으로 정렬한 단어 집합(vocabulary)을 만들고, 빈도수가 높은 순서대로 차례로 낮은 숫자부터 정수를 부여하는 방법


In [3]:
raw_text = '''
A barber is a person. a barber is good person. a barber is huge person. he Knew A Secret! The Secret He Kept is huge secret. Huge secret. His barber kept his word. a barber kept his word. His barber kept his secret. But keeping and keeping such a huge secret to himself was driving the barber crazy. the barber went up a huge mountain.
'''

In [4]:
import nltk
nltk.download('punkt_tab') # 문장을 구분하거나 단어로 쪼갤 때 필요한 pre-trained 모델
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


In [5]:
# 문장 토큰화
sentences = sent_tokenize(raw_text)
sentences

['\nA barber is a person.',
 'a barber is good person.',
 'a barber is huge person.',
 'he Knew A Secret!',
 'The Secret He Kept is huge secret.',
 'Huge secret.',
 'His barber kept his word.',
 'a barber kept his word.',
 'His barber kept his secret.',
 'But keeping and keeping such a huge secret to himself was driving the barber crazy.',
 'the barber went up a huge mountain.']

In [6]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [7]:
# prompt: sentences에서 출현단어의 빈도를 Counters 모듈로 만들어줘

from collections import Counter

# 단어 토큰화 및 불용어 제거
vocab = Counter()
stop_words = set(stopwords.words('english'))

for sentence in sentences:
    tokens = word_tokenize(sentence)
    tokens = [word.lower() for word in tokens if word.isalnum()] # 특수문자 제거, 소문자 변환
    tokens = [w for w in tokens if not w in stop_words] # 불용어 제거
    vocab.update(tokens)

vocab

Counter({'barber': 8,
         'person': 3,
         'good': 1,
         'huge': 5,
         'knew': 1,
         'secret': 6,
         'kept': 4,
         'word': 2,
         'keeping': 2,
         'driving': 1,
         'crazy': 1,
         'went': 1,
         'mountain': 1})

In [8]:
vocab_sorted = sorted(vocab.items(), key = lambda x:x[1], reverse = True)
vocab_sorted

[('barber', 8),
 ('secret', 6),
 ('huge', 5),
 ('kept', 4),
 ('person', 3),
 ('word', 2),
 ('keeping', 2),
 ('good', 1),
 ('knew', 1),
 ('driving', 1),
 ('crazy', 1),
 ('went', 1),
 ('mountain', 1)]

#### 2.1 One-hot Encoding
- 📌 구현 도구: `CountVectorizer`



In [9]:
from tensorflow.keras.preprocessing.text import Tokenizer

text = "나랑 점심 먹으러 갈래 점심 메뉴는 햄버거가 최고야"

tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])

# 단어 인덱스 확인
word_index = tokenizer.word_index
print("단어 인덱스:", word_index)

단어 인덱스: {'점심': 1, '나랑': 2, '먹으러': 3, '갈래': 4, '메뉴는': 5, '햄버거가': 6, '최고야': 7}


In [10]:
# 텍스트를 시퀀스로 변환
sequences = tokenizer.texts_to_sequences([text])
print("시퀀스:", sequences)

시퀀스: [[2, 1, 3, 4, 1, 5, 6, 7]]


In [11]:
text1 = "나랑 저녁 먹으러 갈래 저녁 메뉴는 불고기가 최고야"
sequences = tokenizer.texts_to_sequences([text1])
print("시퀀스:", sequences)

시퀀스: [[2, 3, 4, 5, 7]]


In [12]:
# 텍스트를 one-hot 인코딩 형태로 변환
one_hot_results = tokenizer.texts_to_matrix([text],
                                            mode='binary')
print("One-hot 인코딩:")
print(one_hot_results)

One-hot 인코딩:
[[0. 1. 1. 1. 1. 1. 1. 1.]]


In [13]:
# 시퀀스를 one-hot 인코딩 벡터로 변환
import numpy as np
from tensorflow.keras.utils import to_categorical

sequence_matrix = np.array(sequences[0])
one_hot_vectors = to_categorical(sequence_matrix, num_classes=len(word_index)+1)
print("One-hot 인코딩 벡터:")
print(one_hot_vectors)

One-hot 인코딩 벡터:
[[0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]]


In [15]:
# 데이터프레임으로 변환
# 컬럼명은 단어 사전 순서대로
import pandas as pd

columns = ['패딩(0)'] + [word for word, idx in sorted(word_index.items(), key=lambda x: x[1])]
df = pd.DataFrame(one_hot_vectors, columns=columns)
df

Unnamed: 0,패딩(0),점심,나랑,먹으러,갈래,메뉴는,햄버거가,최고야
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [16]:
# 텍스트 데이터
text = "나랑 점심 먹으러 갈래 점심 메뉴는 햄버거가 최고야"

# Tokenizer 초기화 및 학습
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])

# 단어 인덱스 확인
word_index = tokenizer.word_index
print("단어 인덱스:", word_index)

# 텍스트를 시퀀스로 변환
sequences = tokenizer.texts_to_sequences([text])
print("시퀀스:", sequences)

# 시퀀스를 one-hot 인코딩 벡터로 변환
sequence_matrix = np.array(sequences[0])
one_hot_vectors = to_categorical(sequence_matrix, num_classes=len(word_index)+1)
print("One-hot 인코딩 벡터:")
print(one_hot_vectors)

# 데이터프레임으로 변환
# 컬럼명은 단어 사전 순서대로
columns = ['패딩(0)'] + [word for word, idx in sorted(word_index.items(), key=lambda x: x[1])]
df = pd.DataFrame(one_hot_vectors, columns=columns)

# 원본 단어 추가 (각 벡터가 어떤 단어를 나타내는지 확인하기 위해)
words = text.split()
df['원본단어'] = words

df

단어 인덱스: {'점심': 1, '나랑': 2, '먹으러': 3, '갈래': 4, '메뉴는': 5, '햄버거가': 6, '최고야': 7}
시퀀스: [[2, 1, 3, 4, 1, 5, 6, 7]]
One-hot 인코딩 벡터:
[[0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1.]]


Unnamed: 0,패딩(0),점심,나랑,먹으러,갈래,메뉴는,햄버거가,최고야,원본단어
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,나랑
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,점심
2,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,먹으러
3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,갈래
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,점심
5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,메뉴는
6,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,햄버거가
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,최고야


In [17]:
# Why empty?
sub_text = "친구랑 수영장가서 불고기를 먹을 거야"
encoded = tokenizer.texts_to_sequences([sub_text])[0]
print(encoded)

[]


#### 2.2 Bag of Words (BoW)

<img src='https://miro.medium.com/v2/resize:fit:661/0*cf1wq8eIix-Z2qIf.png'>

- 📌 구현 도구: `CountVectorizer`
- 단어들의 순서는 전혀 고려하지 않고, 단어들의 출현 빈도(frequency)에만 집중하는 텍스트 데이터의 수치화 표현 방법
- Bag of Words를 직역하면 단어들의 가방


In [18]:
# https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html

from sklearn.feature_extraction.text import CountVectorizer
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names_out())
print(X.toarray())

['and' 'document' 'first' 'is' 'one' 'second' 'the' 'third' 'this']
[[0 1 1 1 0 0 1 0 1]
 [0 2 0 1 0 1 1 0 1]
 [1 0 0 1 1 0 1 1 1]
 [0 1 1 1 0 0 1 0 1]]


In [19]:
pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())

Unnamed: 0,and,document,first,is,one,second,the,third,this
0,0,1,1,1,0,0,1,0,1
1,0,2,0,1,0,1,1,0,1
2,1,0,0,1,1,0,1,1,1
3,0,1,1,1,0,0,1,0,1



#### 2.3 N-gram 모델
- 📌 구현 도구: `CountVectorizer(ngram_range=(n, n))`
<img src='https://blog.kakaocdn.net/dn/bhyN57/btq7iOd1q9p/ubLDVIHMqJpzx6HTkXYZWk/img.png'>


In [20]:
vectorizer2 = CountVectorizer(analyzer='word', ngram_range=(2, 2))
X2 = vectorizer2.fit_transform(corpus)
pd.DataFrame(X2.toarray(), columns=vectorizer2.get_feature_names_out())

Unnamed: 0,and this,document is,first document,is the,is this,second document,the first,the second,the third,third one,this document,this is,this the
0,0,0,1,1,0,0,1,0,0,0,0,1,0
1,0,1,0,1,0,1,0,1,0,0,1,0,0
2,1,0,0,1,0,0,0,0,1,1,0,1,0
3,0,0,1,0,1,0,1,0,0,0,0,0,1



#### **2.4 [TF-IDF](https://d-craftshop.tistory.com/26)**
- 📌 구현 도구: `TfidfVectorizer`
- **주의: 직접 계산한 결과와 TfidfVectorizer의 결과가 다를 수 있는데, 이는 TfidfVectorizer가 기본적으로 L2 정규화를 적용하기 때문**

<img src='https://blog.kakaocdn.net/dn/K9evG/btr6Bkx9mDG/h4C1zzkaH9sFeeBq3YDYS1/img.png'>

In [44]:
text1 =['The car is driven on the road.',
        'The truck is driven on the highway']

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(text1)
tf = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())
tf.T

Unnamed: 0,0,1
car,1,0
driven,1,1
highway,0,1
is,1,1
on,1,1
road,1,0
the,2,2
truck,0,1


In [51]:
from sklearn.feature_extraction.text import TfidfVectorizer

# TfidfVectorizer 인스턴스 생성
vectorizer = TfidfVectorizer()

# 문서에서 TF-IDF 특성 추출
tfidf_matrix = vectorizer.fit_transform(text1)

# 특성 이름(단어) 가져오기
feature_names = vectorizer.get_feature_names_out()

# TF-IDF 행렬을 데이터프레임으로 변환하여 보기 좋게 표시
df = pd.DataFrame(
    tfidf_matrix.toarray(),
    columns=feature_names
)

# 결과 출력
print("TF-IDF 행렬:")
df.T

TF-IDF 행렬:


Unnamed: 0,0,1
car,0.424717,0.0
driven,0.30219,0.30219
highway,0.0,0.424717
is,0.30219,0.30219
on,0.30219,0.30219
road,0.424717,0.0
the,0.60438,0.60438
truck,0.0,0.424717


In [21]:
corpus

['This is the first document.',
 'This document is the second document.',
 'And this is the third one.',
 'Is this the first document?']

In [22]:
vocab = list(set(w for doc in corpus for w in doc.split()))
vocab.sort()
print(vocab)

['And', 'Is', 'This', 'document', 'document.', 'document?', 'first', 'is', 'one.', 'second', 'the', 'third', 'this']


- (1) tf(d,t) : 특정 문서 d에서의 특정 단어 t의 등장 횟수.
- (2) df(t) : 특정 단어 t가 등장한 문서의 수.
- (3) idf(t) : df(t)에 반비례하는 수.

In [23]:
N = len(corpus)

print(N)


from math import log # IDF 계산을 위해

def tf(t, d):  # Term Frequency
  return d.count(t)

def idf(t):    # Inverse Document Frequency
  df = 0
  for doc in corpus:
    df += t in doc
  return log(N/(df+1))

def tfidf(t, d):  # tf-idf
  return tf(t,d)* idf(t)

4


In [24]:
# Term Frequency 구하기

result = []

# 각 문서에 대해서 아래 연산을 반복
for i in range(N):
  result.append([])
  d = corpus[i]
  for j in range(len(vocab)):
    t = vocab[j]
    result[-1].append(tf(t, d))

tf_ = pd.DataFrame(result, columns = vocab)
tf_


Unnamed: 0,And,Is,This,document,document.,document?,first,is,one.,second,the,third,this
0,0,0,1,1,1,0,1,2,0,0,1,0,0
1,0,0,1,2,1,0,0,2,0,1,1,0,0
2,1,0,0,0,0,0,0,2,1,0,1,1,1
3,0,1,0,1,0,1,1,1,0,0,1,0,1


In [25]:
# idf 구하기

result = []
for j in range(len(vocab)):
    t = vocab[j]
    result.append(idf(t))

idf_ = pd.DataFrame(result, index=vocab, columns=["IDF"])
idf_


Unnamed: 0,IDF
And,0.693147
Is,0.693147
This,0.287682
document,0.0
document.,0.287682
document?,0.693147
first,0.287682
is,-0.223144
one.,0.693147
second,0.693147


In [26]:
result = []
for i in range(N):
  result.append([])
  d = corpus[i]
  for j in range(len(vocab)):
    t = vocab[j]
    result[-1].append(tfidf(t,d))

tfidf_ = pd.DataFrame(result, columns = vocab)
tfidf_


Unnamed: 0,And,Is,This,document,document.,document?,first,is,one.,second,the,third,this
0,0.0,0.0,0.287682,0.0,0.287682,0.0,0.287682,-0.446287,0.0,0.0,-0.223144,0.0,0.0
1,0.0,0.0,0.287682,0.0,0.287682,0.0,0.0,-0.446287,0.0,0.693147,-0.223144,0.0,0.0
2,0.693147,0.0,0.0,0.0,0.0,0.0,0.0,-0.446287,0.693147,0.0,-0.223144,0.693147,0.287682
3,0.0,0.693147,0.0,0.0,0.0,0.693147,0.287682,-0.223144,0.0,0.0,-0.223144,0.0,0.287682


In [27]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidfv = TfidfVectorizer().fit(corpus)
pd.DataFrame(tfidfv.transform(corpus).toarray(), columns=tfidfv.vocabulary_)

Unnamed: 0,this,is,the,first,document,second,and,third,one
0,0.0,0.469791,0.580286,0.384085,0.0,0.0,0.384085,0.0,0.384085
1,0.0,0.687624,0.0,0.281089,0.0,0.538648,0.281089,0.0,0.281089
2,0.511849,0.0,0.0,0.267104,0.511849,0.0,0.267104,0.511849,0.267104
3,0.0,0.469791,0.580286,0.384085,0.0,0.0,0.384085,0.0,0.384085


### 단어가 많아지면 컬럼이 엄청많아지고 차원의 저주, 비효율적
### **the curse of dimensionality**
<img src ='https://blog.kakaocdn.net/dn/boSJYB/btraVTqkwC1/SyMcbBfsrOozbaoeKQ4ilK/img.png'>

In [29]:
import nltk
from nltk.corpus import gutenberg # import the gutenberg object

nltk.download('gutenberg') # download the Gutenberg corpus if not already downloaded

[nltk_data] Downloading package gutenberg to /root/nltk_data...
[nltk_data]   Unzipping corpora/gutenberg.zip.


True

In [30]:
raw_text = gutenberg.raw('austen-emma.txt')
text = raw_text[:10000]
text

'[Emma by Jane Austen 1816]\n\nVOLUME I\n\nCHAPTER I\n\n\nEmma Woodhouse, handsome, clever, and rich, with a comfortable home\nand happy disposition, seemed to unite some of the best blessings\nof existence; and had lived nearly twenty-one years in the world\nwith very little to distress or vex her.\n\nShe was the youngest of the two daughters of a most affectionate,\nindulgent father; and had, in consequence of her sister\'s marriage,\nbeen mistress of his house from a very early period.  Her mother\nhad died too long ago for her to have more than an indistinct\nremembrance of her caresses; and her place had been supplied\nby an excellent woman as governess, who had fallen little short\nof a mother in affection.\n\nSixteen years had Miss Taylor been in Mr. Woodhouse\'s family,\nless as a governess than a friend, very fond of both daughters,\nbut particularly of Emma.  Between _them_ it was more the intimacy\nof sisters.  Even before Miss Taylor had ceased to hold the nominal\noffice o

In [31]:
# prompt: raw_text[:1000]의 내용을 countervector, tf-idf로 변환

tokens = word_tokenize(text)
stop_words = set(stopwords.words('english'))
words = [w.lower() for w in tokens if w.isalnum() and w.lower() not in stop_words]

# CountVectorizer
vectorizer = CountVectorizer()
X_count = vectorizer.fit_transform(words)  # words를 리스트로 변환해서 사용
print(X_count.toarray())
print(vectorizer.vocabulary_)
pd.DataFrame(X_count.toarray(), columns=vectorizer.vocabulary_)

[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
{'emma': 134, 'jane': 235, 'austen': 36, '1816': 0, 'volume': 470, 'chapter': 63, 'woodhouse': 488, 'handsome': 193, 'clever': 75, 'rich': 373, 'comfortable': 78, 'home': 208, 'happy': 197, 'disposition': 121, 'seemed': 383, 'unite': 454, 'best': 49, 'blessings': 52, 'existence': 151, 'lived': 255, 'nearly': 302, 'years': 493, 'world': 491, 'little': 253, 'distress': 123, 'vex': 466, 'youngest': 495, 'two': 452, 'daughters': 98, 'affectionate': 11, 'indulgent': 225, 'father': 155, 'consequence': 87, 'sister': 398, 'marriage': 272, 'mistress': 291, 'house': 212, 'early': 129, 'period': 336, 'mother': 294, 'died': 109, 'long': 259, 'ago': 17, 'indistinct': 224, 'remembrance': 367, 'caresses': 59, 'place': 338, 'supplied': 421, 'excellent': 149, 'woman': 487, 'governess': 186, 'fallen': 152, 'short': 394, 'affection': 10, 'sixteen': 402, 'miss': 290, 'taylor': 429, 'fami

Unnamed: 0,emma,jane,austen,1816,volume,chapter,woodhouse,handsome,clever,rich,...,sir,beautiful,moonlight,mild,draw,back,fire,found,damp,dirt
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
796,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
797,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
798,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
# TF-IDF Vectorizer
tfidf_vectorizer = TfidfVectorizer()
X_tfidf = tfidf_vectorizer.fit_transform(words) # words를 리스트로 변환해서 사용
print("\nTF-IDF Vectorizer Result:")
print(X_tfidf.toarray())
tfidf_vectorizer.vocabulary_
pd.DataFrame(X_tfidf.toarray(), columns=tfidf_vectorizer.vocabulary_)


TF-IDF Vectorizer Result:
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


Unnamed: 0,emma,jane,austen,1816,volume,chapter,woodhouse,handsome,clever,rich,...,sir,beautiful,moonlight,mild,draw,back,fire,found,damp,dirt
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
796,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
797,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
798,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
