- 02.pandas 디렉토리> data 디렉토리 생성하기
- 확장에서 Rainbow CSV 설치하기

### 2. 데이터 입출력

In [1]:
import pandas as pd
import numpy as np

- %%writefile : 매직(magic) 명령어
    -   column 이름에 공백 없애기

In [2]:
%%writefile data/sample1.csv
c1,c2,c3
1,1.11, one
2,2.22,two
3,3.33,three

Overwriting data/sample1.csv


- CSV파일 읽기

In [3]:
df = pd.read_csv('data/sample1.csv')
df

Unnamed: 0,c1,c2,c3
0,1,1.11,one
1,2,2.22,two
2,3,3.33,three


In [4]:
df.columns

Index(['c1', 'c2', 'c3'], dtype='object')

In [5]:
%%writefile data/sample2.csv
1,1.11, one
2,2.22,two
3,3.33,three

Overwriting data/sample2.csv


In [6]:
df = pd.read_csv('data/sample2.csv')    # 잘못된 예- column명 빠진 경우
df

Unnamed: 0,1,1.11,one
0,2,2.22,two
1,3,3.33,three


In [7]:
df = pd.read_csv('data/sample2.csv', names=['c1', 'c2', 'c3'])  # column이름 부여
df

Unnamed: 0,c1,c2,c3
0,1,1.11,one
1,2,2.22,two
2,3,3.33,three


In [8]:
df = pd.read_csv('data/sample1.csv', index_col='c1')
df

Unnamed: 0_level_0,c2,c3
c1,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1.11,one
2,2.22,two
3,3.33,three


In [9]:
%%writefile data/sample3.tsv
c1        c2        c3        c4
0.179181 -1.538472  1.347553  0.43381
1.024209  0.087307 -1.281997  0.49265
0.417899 -2.002308  0.255245 -1.10515

Overwriting data/sample3.tsv


In [10]:
df = pd.read_csv('data/sample3.tsv',sep='\s+')  # \s+ : blank 여러개(space여러개) 또는 탭
df

Unnamed: 0,c1,c2,c3,c4
0,0.179181,-1.538472,1.347553,0.43381
1,1.024209,0.087307,-1.281997,0.49265
2,0.417899,-2.002308,0.255245,-1.10515


In [11]:
df.to_csv('data/test.tsv', sep='\t', index=False)   # \t : 탭 (*vsc에서는 먹히지 않음)

In [12]:
df = pd.read_csv('data/sample3.tsv',sep='\s+')
df

Unnamed: 0,c1,c2,c3,c4
0,0.179181,-1.538472,1.347553,0.43381
1,1.024209,0.087307,-1.281997,0.49265
2,0.417899,-2.002308,0.255245,-1.10515


- 여러줄의 header

In [13]:
%%writefile sample4.csv
파일 제목: sample4.txt
데이터 포맷의 설명:
c1,c2,c3
1,1.11, one
2,2.22,two
3,3.33,three

Overwriting sample4.csv


In [14]:
df = pd.read_csv('data/sample4.csv', skiprows=2)
df

FileNotFoundError: [Errno 2] No such file or directory: 'data/sample4.csv'

- CSV 파일 출력
    - index=False로 설정하기: 그냥 명령하면 'Unnamed' 열이 같이 출력됨
    - header=False도 가능하나 보통 쓰임x

In [None]:
df.to_csv('data/sample6.csv')
df = pd.read_csv('data/sample6.csv')
df

Unnamed: 0.1,Unnamed: 0,c1,c2,c3,c4
0,0,0.179181,-1.538472,1.347553,0.43381
1,1,1.024209,0.087307,-1.281997,0.49265
2,2,0.417899,-2.002308,0.255245,-1.10515


In [None]:
del df['Unnamed: 0']
df.to_csv('data/sample7.csv', index=False)
df = pd.read_csv('data/sample7.csv')
df

- 인터넷 상의 파일(웽에 제공되는 csv파일을 url형태로 읽음)

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/datascienceschool/docker_rpython/master/data/titanic.csv")

In [None]:
df.head()

In [None]:
# 네임, 티켓컬럼 지우고 타이타닉csv로 저장하기
del df['Name'], df['Ticket']
df.to_csv('data/titanic.csv', index=False) # 저장

titanic = pd.read_csv('data/titanic.csv')
titanic.head()

In [None]:
# 여러 컬럼을 삭제할 경우, drop method를 사용하기도 한다.
df.drop(columns=['Name', 'Ticket'], inplace=True)
df.head()