## 优化俱乐部
- 数据样本 https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

In [1]:
import pandas as pd,numpy as np

In [2]:
df = pd.read_parquet('yellow_tripdata_2022-04.parquet')

In [3]:
len(df)

3599920

In [4]:
df.columns

Index(['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime',
       'passenger_count', 'trip_distance', 'RatecodeID', 'store_and_fwd_flag',
       'PULocationID', 'DOLocationID', 'payment_type', 'fare_amount', 'extra',
       'mta_tax', 'tip_amount', 'tolls_amount', 'improvement_surcharge',
       'total_amount', 'congestion_surcharge', 'airport_fee'],
      dtype='object')

In [12]:
df.head

<bound method NDFrame.head of          VendorID tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  \
0               1  2022-04-01 00:21:13   2022-04-01 00:58:33              1.0   
1               1  2022-04-01 00:07:47   2022-04-01 00:19:12              0.0   
2               1  2022-04-01 00:14:52   2022-04-01 00:23:43              1.0   
3               1  2022-04-01 00:30:02   2022-04-01 00:45:06              1.0   
4               2  2022-04-01 00:48:40   2022-04-01 01:03:34              1.0   
...           ...                  ...                   ...              ...   
3599915         2  2022-04-30 23:10:00   2022-04-30 23:28:00              NaN   
3599916         1  2022-04-30 23:28:59   2022-04-30 23:46:38              NaN   
3599917         2  2022-04-30 23:27:25   2022-04-30 23:45:00              NaN   
3599918         2  2022-04-30 23:24:26   2022-04-30 23:43:45              NaN   
3599919         2  2022-04-30 23:17:00   2022-04-30 23:31:00              NaN  

### 准则:
[The Rules of Optimization Club:](https://wiki.c2.com/?RulesOfOptimizationClub)

1. 你没有进行优化。
2. 你没有进行优化，除非先进行测量。
3. 当性能不受代码约束，而是受外部因素约束时，优化就结束了。
4. 只优化已经具有完整单元测试覆盖率的代码。
5. 一次解决一个因素。
6. 没有未解决的错误，没有进度压力。
7. 只要有必要，测试就会一直进行。
8. 如果这是你第一次参加优化俱乐部，你必须编写一个测试用例。


In [5]:
%time df = pd.read_parquet('yellow_tripdata_2022-04.parquet')

CPU times: user 613 ms, sys: 305 ms, total: 918 ms
Wall time: 412 ms


In [6]:
df.to_csv('yellow_tripdata_2022-04.csv', index=False)

In [9]:
ls -hs *

    16 faster_less_memory.ipynb         164352 yellow_tripdata_2022-04.csv.gz
755112 yellow_tripdata_2022-04.csv      108608 yellow_tripdata_2022-04.parquet


In [10]:
%time df = pd.read_csv('yellow_tripdata_2022-04.csv.gz')



CPU times: user 3.68 s, sys: 516 ms, total: 4.2 s
Wall time: 4.45 s


In [11]:
%timeit df['tip_amount'].sum()

1.56 ms ± 23.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## 观察魔法词

In [13]:
%%timeit
totals = {}
for vid in df['VendorID'].unique():
    totals[vid] = df[df['VendorID']==vid]['tip_amount'].sum()

173 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### %%prun
当你需要优化代码性能，找出代码中的瓶颈时，可以使用 %%prun。它特别适用于以下情况：

- 代码优化：在开发过程中，确定哪些部分的代码需要优化。
- 教育目的：向学生展示代码执行的性能分析。
- 调试：理解代码执行过程中的时间消耗。

> 输出结果

%%prun 的输出结果会显示在执行代码的下方，通常包括以下信息：

- “calls”:call count
- “cumulative”:cumulative time
- “file”:file name
- “module”:file name
- “pcalls”:primitive call count
- “line”:line number
- “name”:function name
- “nfl”:name/file/line
- “stdname”:standard name
- “time”:internal time


In [16]:
%%prun 
totals = {}
for vid in df['VendorID'].unique():
    totals[vid] = df[df['VendorID']==vid]['tip_amount'].sum()

 

         3379 function calls (3299 primitive calls) in 0.596 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       12    0.461    0.038    0.489    0.041 take.py:120(_take_nd_ndarray)
        1    0.037    0.037    0.037    0.037 algorithms.py:427(unique_with_mask)
        1    0.028    0.028    0.594    0.594 <string>:1(<module>)
       12    0.026    0.002    0.026    0.002 {built-in method numpy.empty}
        4    0.009    0.002    0.009    0.002 {method 'nonzero' of 'numpy.ndarray' objects}
        4    0.003    0.001    0.004    0.001 utils.py:239(maybe_convert_indices)
        4    0.003    0.001    0.003    0.001 necompiler.py:977(re_evaluate)
        5    0.003    0.001    0.003    0.001 {method 'astype' of 'numpy.ndarray' objects}
       32    0.003    0.000    0.003    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        4    0.003    0.001    0.003    0.001 missing.py:261(_isna_array)
        2    0.001    

In [17]:
%%prun -s cumulative
totals = {}
for vid in df['VendorID'].unique():
    totals[vid] = df[df['VendorID']==vid]['tip_amount'].sum()

 

         3380 function calls (3300 primitive calls) in 0.536 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       13    0.001    0.000    0.485    0.037 frame.py:4062(__getitem__)
        4    0.000    0.000    0.483    0.121 frame.py:4130(_getitem_bool_array)
        4    0.000    0.000    0.459    0.115 generic.py:4142(_take_with_is_copy)
        4    0.000    0.000    0.459    0.115 generic.py:4027(take)
        4    0.000    0.000    0.459    0.115 managers.py:869(take)
        4    0.000    0.000    0.450    0.112 managers.py:623(reindex_indexer)
       12    0.000    0.000    0.450    0.037 blocks.py:1287(take_nd)
       12    0.000    0.000    0.449    0.037 take.py:59(take_nd)
       12    0.424    0.035    0.449    0.037 take.py:120(_take_nd_ndarray)
        1    0.020    0.020    0.164    0.164 <string>:1(<module>)
       12    0.024    0.002    0.024    0.002 {built-in method numpy.empty}
        4    0.022  

In [18]:
%%timeit
totals = {}
for vid in df['VendorID'].unique():
    totals[vid] = df[df['VendorID']==vid]['tip_amount'].sum()

168 ms ± 2.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [20]:
%timeit df.groupby('VendorID')['tip_amount'].sum()

48.3 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### snakeviz, py-spy...

In [21]:
df10k = df[:10_000]

In [25]:
%%timeit
total = 0
for _, row in df10k.iterrows():
    if row['VendorID'] == 2:
        total += row['tip_amount']

120 ms ± 2.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [27]:
%timeit total = df10k[df10k['VendorID']==2]['tip_amount'].sum()

269 μs ± 14.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
