## 欢迎进入 ModelWhale Notebook  

这里你可以编写代码，文档  

### 关于文件目录  


**project**：project 目录是本项目的工作空间，可以把将项目运行有关的所有文件放在这里，目录中文件的增、删、改操作都会被保留  


**input**：input 目录是数据集的挂载位置，所有挂载进项目的数据集都在这里，未挂载数据集时 input 目录被隐藏  


**temp**：temp 目录是临时磁盘空间，训练或分析过程中产生的不必要文件可以存放在这里，目录中的文件不会保存  


In [2]:
# 试试这个经典示例
print ("hello ModelWhale")

hello ModelWhale


In [3]:
# 查看个人持久化工作区文件
!ls /home/mw/project/

In [1]:
# 查看当前挂载的数据集目录
!ls /home/mw/input/

bulubulu3820


In [1]:
import pandas as pd
from sqlalchemy import create_engine
# 引入库

In [2]:
appointments = pd.read_csv('/home/mw/input/bulubulu3820/shopping_trends.csv')
# 引入数据集

In [3]:
engine = create_engine('sqlite:///:memory:')
appointments.to_sql('appts', con=engine, index=False)
# 挂载数据库

In [4]:
df = pd.read_csv('/home/mw/input/bulubulu3820/shopping_trends.csv')
# 加载数据库

In [5]:
df['Date'] = pd.to_datetime(df['Date'], format='%Y/%m/%d')
df['Month'] = df['Date'].dt.month  # 新增月份特征
df['Age_Group'] = pd.cut(df['Age'], bins=[0, 18, 30, 50, 100], 
                        labels=['未成年','青年','中年','老年'])
# 

In [12]:
query = """
SELECT
    STRFTIME('%Y-%m', Date) AS Month,
    Category,
    COUNT(*) AS Sales_Volume,
    AVG("Purchase Amount") AS Avg_Spending
FROM shopping_data
GROUP BY Month, Category
ORDER BY Month, Sales_Volume DESC
"""
# -- SQL版（适合大数据量）
# Pandas版（快速可视化）
print("Columns in DataFrame:", df.columns.tolist())

Columns in DataFrame: ['Date', 'Customer ID', 'Age', 'Gender', 'Item Purchased', 'Category', 'Purchase Amount ', 'Location', 'Size', 'Color', 'Season', 'Review Rating', 'Subscription Status', 'Discount Applied', 'Promo Code Used', 'Month', 'Age_Group']


In [15]:
import pandas as pd

# 打印所有列名
print("所有列名:", df.columns.tolist())

# 检查可能的变体
possible_names = ['Purchase Amount', 'PurchaseAmount', 'Purchase_Amount', 'purchase_amount']
existing_cols = [col for col in df.columns if any(name in col for name in possible_names)]

print("匹配的列名:", existing_cols)
df = df.rename(columns={'Purchase Amount': 'Purchase Amount '})

所有列名: ['Date', 'Customer ID', 'Age', 'Gender', 'Item Purchased', 'Category', 'Purchase Amount ', 'Location', 'Size', 'Color', 'Season', 'Review Rating', 'Subscription Status', 'Discount Applied', 'Promo Code Used', 'Month', 'Age_Group']
匹配的列名: ['Purchase Amount ']


In [17]:
# Pandas版（快速可视化）
trend = df.groupby(['Month', 'Category']).agg(
    Sales_Volume=('Item Purchased', 'count'),
    Avg_Spending=('Purchase Amount ', 'mean')
).reset_index()

In [18]:
# 性别消费差异
gender_analysis = df.groupby('Gender').agg(
    Avg_Spending=('Purchase Amount ', 'mean'),
    Favorite_Category=('Category', lambda x: x.mode()[0])
)

# 年龄与颜色偏好
age_color = pd.crosstab(df['Age_Group'], df['Color'], normalize='index') * 100

In [19]:
# 地域消费力TOP10
top_locations = df.groupby('Location')['Purchase Amount '].sum().nlargest(10)

# 季节性商品策略
season_strategy = df.groupby(['Season', 'Category']).agg(
    Sales=('Item Purchased', 'count'),
    Avg_Rating=('Review Rating', 'mean')
).sort_values(['Season', 'Sales'], ascending=False)

In [22]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
heatmap_data = trend.pivot("Month", "Category", "Sales_Volume")
sns.heatmap(heatmap_data, annot=True, fmt=".0f", cmap="YlGnBu")
plt.title('Monthly Sales Heatmap')
plt.savefig('/home/mw/project/monthly_sales.png')  # 保存到项目

In [21]:
from pyecharts.charts import Radar

radar = Radar()
radar.add_schema(schema=[
    {"name": "Red", "max": 30},
    {"name": "Blue", "max": 30},
    {"name": "Green", "max": 30},
    {"name": "Black", "max": 30}
])
radar.add("Age Group", [age_color.loc['青年'].values.tolist()])
radar.render('color_preference.html')  # 交互式图表

'/home/mw/project/color_preference.html'

In [25]:
radar.render_notebook()

In [24]:
from IPython.display import IFrame

# 直接内嵌显示 HTML
IFrame(src='/home/mw/project/color_preference.html', width=800, height=600)

In [27]:
pip install --upgrade pandas -i https://pypi.tuna.tsinghua.edu.cn/simple


The following command must be run outside of the IPython shell:

    $ pip install --upgrade pandas -i https://pypi.tuna.tsinghua.edu.cn/simple

The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.

See the Python documentation for more information on how to install packages:

    https://docs.python.org/3/installing/
