## Python Package - pandas

### pandas
- Python中用於資料處理的套件，與Excel相似，只是使用程式碼操作表格
- pandas中還有使用到其他相依套件，ex: NumPy (Numerical Python)，是科學計算中常用套件
- 在pandas中作為像是Excel Sheet的物件稱為DataFrame

---

- 推薦書籍
  - [Python資料分析 (O’Reilly)](https://www.books.com.tw/products/0010968510?srsltid=AfmBOoryGkv8JPPyoZ8S2Iu-KtqMNQyNL-ID14nWQDcx3b5jpDw_3tPd)
- 相似套件
  - polars
    - 跟pandas能做的事一樣，也支援pandas的DataFrame
    - 更強大、處理資料速度更快
    - 比較新，所以使用方式的介紹比較少

In [None]:
# 使用套件 import
# You can just import pandas. But every time you call function under pandas, you'll need to type pandas.function_name.
# Sometimes we will set up abbreviation such as pd for pandas. That's how lazy engineers are. Avoid redundant work.
import pandas as pd

# Create a simple DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
display(df)

In [None]:
print("DataFrame:")
display(df)

In [None]:
df.describe()

In [None]:
# Filter rows where Age > 30
filtered_df = df[ df['Age'] > 30 ]
print("Filtered DataFrame (Age > 30):")
display(filtered_df)

In [None]:
# Add a new column
df['Salary'] = [70000, 80000, 90000]
print("DataFrame with new column (Salary):")
display(df)


In [None]:
# Save the DataFrame to a CSV file
df.to_csv('output1.csv', index=False)

# 只支援新版 excel, 不能讀寫 97-2003 的 ".xls" 檔案
# 需要安裝 openpyxl 套件: pip install openpyxl
df.to_excel('output2.xlsx', index=False)
print("DataFrame saved")