## Matplotlib & Seaborn

### 1. Matplotlib & Seaborn
- Matplotlib: 파이썬에서 가장 기본이 되는 그래프 라이브러리. 여러 종류의 차트(선형 차트, 막대 차트, 산점도 등)를 직접 그릴 수 있고, 많은 커스터마이징이 가능
    - `plt.plot(x, y)`: 선형 그래프(시간에 따른 변화, 순서가 있는 데이터 추이 파악할 때)
    - `plt.bar()`: 수직 막대 그래프, `plt.barh()`: 수평 막대 그래프 (범주(분류)에 따른 값을 비교할 때)
    - `plt.scatter(x, y)`: 산점도 (두 변수 사이의 관계를 알고 싶을 때(상관 관계))
    - `plt.hist(data, bins=10)`: 히스토그램 (연속형 데이터가 어떻게 분포되어 있는지(어디에 몰려 있고, 어떤 구간이 많은지) 알고 싶을 때.)


- Seaborn: Matplotlib 기반으로 만들어진 좀 더 쉬운 시각화 라이브러리
    - `sns.lineplot(x=..., y=..., data=...)`: 선형 그래프
    - `sns.barplot`: 막대 그래프
    - `sns.scatterplot(x='col1', y='col2', data=df)`: 산점도
    - `sns.histplot(data=df['some_column'])`: 히스토그램 
    - `sns.boxplot(x='col1', y='col2', data=df)`: 박스플롯(Box Plot)
    - `sns.violinplot(x='col1', y='col2', data=df, inner="quartile")  # 중앙값/사분위선 표시`: 바이올린 플롯(Violin Plot)


### 2. 그래프 꾸미기
- 범례(legend) 달기:
- 스타일 시트 사용: `plt.style.use('ggplot')`   # 'seaborn', 'fivethirtyeight' 등등
- Seaborn 테마: `sns.set_theme(style="whitegrid")`
- 색상 팔레트: `sns.set_palette("pastel")`  # "deep", "dark", "Set2" 등 다양한 옵션

In [None]:
# pip install matplotlib seaborn pandas
# pip install plotly
# pip install nbformat
# pip install --upgrade nbformat



In [None]:
%matplotlib inline # 이 설정을 노트북 맨 위에 써주면, 그림이 노트북 셀 안에 바로 표시.

In [None]:
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Malgun Gothic' # 한국어가 출력이 되지 않으면 폰트 설정으로 출력 가능하게 변경

In [None]:
x = [1, 2, 3, 4]
y = [10, 20, 15, 25]

plt.plot(x, y, marker='o') #marker = 각점에 표시 할 모양
plt.title("기본 라인 그래프") # 제목 레이블
plt.xlabel("X축") # x축 레이블
plt.ylabel("Y축") # y축 레이블
plt.show()

In [None]:
import seaborn as sns
import pandas as pd

df = pd.DataFrame({
    'month': ['Jan','Feb','Mar','Apr'],
    'sales': [10, 20, 15, 25]
})

sns.lineplot(x='month', y='sales', data=df) #sns.lineplot(): 선형 그래프 함수
plt.title("Seaborn 라인 그래프")
plt.show()

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

days = ["Mon","Tue","Wed","Thu","Fri","Sat","Sun"]
sales = [10, 12, 9, 13, 15, 14, 11]

plt.plot(days, sales, marker="o")
plt.title("Daily Sales")
plt.xlabel("Day")
plt.ylabel("Sales")
plt.show()

In [None]:
df = pd.DataFrame({"day": days, "sales": sales})
sns.lineplot(x="day", y="sales", data=df, marker="o")
plt.title("Daily Sales (Seaborn)")
plt.show()

In [None]:
fruits = ["Apple","Banana","Cherry"]
stocks = [30, 18, 25]

# 수평
plt.barh(fruits, stocks) 
# plt.bar(fruits, stocks) -> 수직
plt.title("Fruit Stocks")
plt.xlabel("Count")
plt.show()

In [None]:
basket = pd.DataFrame({"fruit": ["Apple","Banana","Apple","Cherry","Apple"]})
sns.countplot(x="fruit", data=basket)
plt.title("Fruit Count (Auto)")
plt.show()

In [None]:
height = [160, 165, 170, 175, 180]
weight = [55, 60, 65, 72, 80]

plt.scatter(height, weight)
plt.title("Height vs Weight")
plt.xlabel("Height (cm)")
plt.ylabel("Weight (kg)")
plt.show()


In [None]:
df = pd.DataFrame({
    "height": [160,165,170,175,180,168,172],
    "weight": [55,60,65,72,80,58,67],
    "sex":    ["F","M","M","M","M","F","F"]
})
sns.scatterplot(x="height", y="weight", hue="sex", data=df)
plt.title("Height vs Weight by Sex")
plt.show()

In [None]:
scores = [55, 60, 62, 65, 70, 72, 75, 78, 80, 82, 85, 88]
plt.hist(scores, bins=5)  # 구간 5개로 쪼개기
plt.title("Score Distribution")
plt.xlabel("Score"); plt.ylabel("Count")
plt.show()

In [None]:
df = pd.DataFrame({"score": scores})
sns.histplot(data=df, x="score", bins=5, kde = True)  # kde=True(부드러운 곡선) 옵션 가능
plt.title("Score Distribution (Seaborn)")
plt.show()

In [None]:
df = pd.DataFrame({
    "class": ["A","A","A","B","B","B", "B", "A"],
    "score": [70, 75, 80, 65, 68, 78, 200, 230]
})
sns.boxplot(x="class", y="score", data=df)
plt.title("Scores by Class (Boxplot)")
plt.show()

In [None]:
sns.violinplot(x="class", y="score", data=df, inner="quartile")  # 중앙값/사분위선 표시
plt.title("Scores by Class (Violin)")
plt.show()

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# 예: tips (Seaborn 내장 샘플 데이터셋)
tips = sns.load_dataset('tips')
tips.head()  # 상위 몇 행 미리보기

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [None]:
# sns.scatterplot(x='total_bill', y='tip', data=tips, hue = 'time')
# plt.show()

sns.barplot(x='day', y='tip', data=tips)
plt.title("average tip for days")
plt.show()

In [None]:
sns.boxplot(x='smoker', y='tip', data=tips)
plt.title("흡연자 vs 비흡연자 팁 분포")
plt.show()

In [47]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('movie_metadata.csv')
df.head()

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


In [None]:
import matplotlib.pyplot as plt

# 연도별 영화 개수 계산
movies_per_year = df['title_year'].value_counts().sort_index()  # 연도별 count (연도 정렬)
years = movies_per_year.index.astype(int)
counts = movies_per_year.values

# 막대그래프 그리기
plt.figure(figsize=(6,4))
plt.bar(years, counts, color='skyblue', width=0.6)
plt.xlabel('Year')
plt.ylabel('Number of Movies')
plt.title('Number of Movies Released Each Year')
# plt.xticks(years, rotation=90)
plt.tight_layout()
plt.show()

In [None]:
import seaborn as sns

plt.figure(figsize=(6,4))
sns.barplot(x=movies_per_year.index, y=movies_per_year.values, palette='pastel')
plt.xlabel('Year')
plt.ylabel('Number of Movies')
plt.title('Number of Movies Released Each Year (Seaborn)')
# plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
import plotly.express as px

fig = px.bar(
    x=movies_per_year.index,
    y=movies_per_year.values,
    labels={'x':'Year', 'y':'Number of Movies'},
    title='Number of Movies Released Each Year'
)
fig.show()