# pandas库学习

`pandas`是Python中非常流行的数据分析和数据处理库。它提供了高效、便捷的数据结构和数据分析工具，特别适用于处理结构化数据。Pandas主要有两个数据结构：Series 和 DataFrame。

---

### 1. 安装pandas库

In [None]:
pip install pandas

---

### 2.基本数据结构

In [2]:
import pandas as pd

# 创建一个Series
data = pd.Series([1, 3, 5, 7, 9])#带标签的数组
data

0    1
1    3
2    5
3    7
4    9
dtype: int64

In [3]:
# 创建一个DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)#带标签的表格
print(df)


      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


---

### 3.数据导入与导出

In [5]:
df = pd.read_csv('iris_dataset.csv')
print(df)#从CSV文件读取数据

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                  5.1               3.5                1.4               0.2   
1                  4.9               3.0                1.4               0.2   
2                  4.7               3.2                1.3               0.2   
3                  4.6               3.1                1.5               0.2   
4                  5.0               3.6                1.4               0.2   
..                 ...               ...                ...               ...   
145                6.7               3.0                5.2               2.3   
146                6.3               2.5                5.0               1.9   
147                6.5               3.0                5.2               2.0   
148                6.2               3.4                5.4               2.3   
149                5.9               3.0                5.1               1.8   

       species  
0       se

In [9]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)#带标签的表格
df.to_csv('out.csv', index=False)
#数据导入

---

### 4.数据选择和过滤

In [10]:
ages = df['Age']
ages#筛选出指定列

0    25
1    30
2    35
Name: Age, dtype: int64

In [11]:
row_0 = df.loc[0]
print(row_0)
#使用标签筛选行

Name       Alice
Age           25
City    New York
Name: 0, dtype: object


In [12]:
row_0 = df.iloc[0]
print(row_0)
#使用位置筛选行

Name       Alice
Age           25
City    New York
Name: 0, dtype: object


In [13]:
adults = df[df['Age'] > 30]
print(adults)
#筛选符合条件的行

      Name  Age     City
2  Charlie   35  Chicago


---

### 5. 数据清洗和处理

In [14]:
# 创建一个包含缺失值的DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, None, 35],
    'City': ['New York', 'Los Angeles', None]
}
df = pd.DataFrame(data)

# 检查缺失值
print(df.isnull())

# 删除包含缺失值的行
df_cleaned = df.dropna()
print(df_cleaned)

# 填充缺失值
df_filled = df.fillna('Unknown')
print(df_filled)


    Name    Age   City
0  False  False  False
1  False   True  False
2  False  False   True
    Name   Age      City
0  Alice  25.0  New York
      Name      Age         City
0    Alice     25.0     New York
1      Bob  Unknown  Los Angeles
2  Charlie     35.0      Unknown


In [15]:
# 将年龄列转换为整数类型
df['Age'] = df['Age'].astype('Int64')
print(df)

      Name   Age         City
0    Alice    25     New York
1      Bob  <NA>  Los Angeles
2  Charlie    35         None


In [6]:
import pandas as pd

path = "G:\VScode project\Python\iris_dataset.csv"

data = pd.read_csv(path)

data

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica
