# 1122_DS_Lab3  資料聚合及遮罩應用於股價分析

# Aggregations(聚合函數): Min, Max, and Everything In Between
Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question. Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the "typical" values in a dataset, but other aggregates are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.).
** Most texts are released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT).

In [1]:
import numpy as np
A = np.random.random(100)
A

array([0.42279991, 0.35998898, 0.14886168, 0.61965131, 0.32210027,
       0.95006799, 0.89974475, 0.85420915, 0.77389179, 0.29078473,
       0.9777984 , 0.37326739, 0.77789888, 0.25737516, 0.8357666 ,
       0.62631503, 0.82620536, 0.74834062, 0.83238126, 0.76833411,
       0.62450954, 0.72619293, 0.42294394, 0.58385335, 0.79880044,
       0.43727477, 0.71926439, 0.74445899, 0.87462603, 0.74101022,
       0.66144492, 0.78324596, 0.4718039 , 0.92503629, 0.4074385 ,
       0.71347457, 0.72408925, 0.90626212, 0.94097179, 0.85832337,
       0.32476509, 0.75933321, 0.98770791, 0.87762468, 0.76384662,
       0.24030858, 0.95311188, 0.44918535, 0.82891905, 0.76255395,
       0.58529194, 0.92454803, 0.9416578 , 0.21009648, 0.34153404,
       0.16356494, 0.20381682, 0.78175091, 0.39987805, 0.45622992,
       0.58232736, 0.77252479, 0.00888427, 0.57507631, 0.53905071,
       0.38349389, 0.26617078, 0.95540747, 0.95838331, 0.45003795,
       0.39618716, 0.75981636, 0.82720135, 0.80490546, 0.00237

In [2]:
np.sum(A)
%timeit sum(A)
%timeit np.sum(A)

10.1 µs ± 67.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
4.66 µs ± 45 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [5]:
big_array = np.random.rand(1000)
%timeit sum(big_array)
%timeit np.sum(big_array)

92.8 µs ± 717 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
5.01 µs ± 55.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [6]:
np.min(big_array), np.max(big_array)  # 比較快

(6.193893556472041e-05, 0.9985321485195751)

In [8]:
M = np.random.random((3, 4))
print(M)

[[0.53993008 0.06225758 0.5669053  0.26649228]
 [0.91195733 0.54594808 0.37898686 0.48552897]
 [0.11024725 0.79656082 0.29759867 0.9587755 ]]


In [14]:
print(sum(M))

[1.56213465 1.40476649 1.24349083 1.71079676]


In [9]:
print(sum(sum(M)))

5.921188737269386


In [10]:
M.sum()

5.921188737269385

## Aggregation functions take an additional argument specifying the axis along which the aggregate is computed. 
For example, we can find the minimum value within each column by specifying axis=0. The axis keyword specifies the dimension of the array that will be collapsed, rather than the dimension that will be returned. So specifying axis=0 means that the first axis will be collapsed: for two-dimensional arrays, this means that values within each column will be aggregated.

![AXIS範例](axis.jpg)

In [11]:
M.sum(axis=0)

array([1.56213465, 1.40476649, 1.24349083, 1.71079676])

In [13]:
M.sum(axis=1)

array([1.43558525, 2.32242124, 2.16318224])

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |



In [31]:
import pandas as pd
import numpy as np
import requests

In [17]:
date = "20240304"
url = f'https://www.twse.com.tw/exchangeReport/MI_INDEX?response=json&date={date}&type=ALLBUT0999'
response = requests.get(url)
response_json = response.json()
stockdata = pd.DataFrame(response_json['data9'], columns=response_json['fields9'])
origin = stockdata.copy()
stockdata.head()

Unnamed: 0,證券代號,證券名稱,成交股數,成交筆數,成交金額,開盤價,最高價,最低價,收盤價,漲跌(+/-),漲跌價差,最後揭示買價,最後揭示買量,最後揭示賣價,最後揭示賣量,本益比
0,50,元大台灣50,25733539,23900,3766826897,145.0,147.2,144.85,146.95,<p style= color:red>+</p>,4.15,146.95,46,147.0,201,0.0
1,51,元大中型100,195708,312,15327688,77.95,78.55,77.95,78.0,<p style= color:red>+</p>,0.3,77.95,4,78.0,2,0.0
2,52,富邦科技,632104,866,93693852,146.95,149.25,146.95,149.05,<p style= color:red>+</p>,5.05,149.0,30,149.05,1,0.0
3,53,元大電子,19546,105,1566876,79.65,80.65,79.65,80.65,<p style= color:red>+</p>,2.3,80.3,27,80.5,1,0.0
4,55,元大MSCI金融,162964,366,3893412,23.92,23.92,23.85,23.9,<p style= color:green>-</p>,0.02,23.89,1,23.9,45,0.0


In [19]:
dayprice = np.array(stockdata['收盤價'])   #也可以這樣寫  dayprice = np.array(stockdata.收盤價)
print(dayprice)

['146.95' '78.00' '149.05' ... '24.15' '23.90' '252.50']


In [38]:
print(len(dayprice))
print(type(dayprice))

1225
<class 'numpy.ndarray'>


In [32]:
a = np.array(stockdata.收盤價[0:9])
print(a)

['146.95' '78.00' '149.05' '80.65' '23.90' '38.00' '110.00' '16.70'
 '70.75']


In [33]:
print(len(a))
print(type(a))

9
<class 'numpy.ndarray'>


In [34]:
a= a.astype(float)  # change data type (string to float)
print(a)

[146.95  78.   149.05  80.65  23.9   38.   110.    16.7   70.75]


In [35]:
print(a.mean())

79.33333333333333


In [37]:
print("Mean 收盤價:", a.mean())
print("Standard 收盤價:", a.std())
print("Minimum 收盤價:    ", a.min())
print("Maximum 收盤價:    ", a.max())
print("25th percentile:   ", np.percentile(a, 25))
print("Median:            ",  np.median(a))
print("75th percentile:   ", np.percentile(a, 75))

Mean 收盤價: 79.33333333333333
Standard 收盤價: 46.13300818767886
Minimum 收盤價:     16.7
Maximum 收盤價:     149.05
25th percentile:    38.0
Median:             78.0
75th percentile:    110.0


In [39]:
print("Mean 收盤價:", dayprice.mean())
print("Standard 收盤價:", dayprice.std())
print("Minimum 收盤價:    ", dayprice.min())
print("Maximum 收盤價:    ", dayprice.max())
print("25th percentile:   ", dayprice.percentile(heights, 25))
print("Median:            ", dayprice.median(heights))
print("75th percentile:   ", dayprice.percentile(heights, 75))

TypeError: ufunc 'divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

In [40]:
print(dayprice[0:40])

['146.95' '78.00' '149.05' '80.65' '23.90' '38.00' '110.00' '16.70'
 '70.75' '95.90' '28.80' '27.84' '21.98' '85.85' '--' '176.35' '4.13'
 '31.46' '4.77' '24.90' '19.46' '6.20' '12.49' '9.99' '10.30' '55.45'
 '6.64' '17.41' '11.32' '--' '37.16' '50.55' '80.05' '5.81' '8.57' '9.69'
 '36.37' '58.60' '6.76' '21.81']


In [41]:
b= np.array(stockdata.收盤價[0:40])

In [42]:
print(b)
print(b.size)

['146.95' '78.00' '149.05' '80.65' '23.90' '38.00' '110.00' '16.70'
 '70.75' '95.90' '28.80' '27.84' '21.98' '85.85' '--' '176.35' '4.13'
 '31.46' '4.77' '24.90' '19.46' '6.20' '12.49' '9.99' '10.30' '55.45'
 '6.64' '17.41' '11.32' '--' '37.16' '50.55' '80.05' '5.81' '8.57' '9.69'
 '36.37' '58.60' '6.76' '21.81']
40


In [43]:
b= b.astype(float)

ValueError: could not convert string to float: '--'

In [44]:
b_new=(b!='--')
print(b_new)

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True False  True  True  True  True  True  True  True  True  True
  True  True  True  True  True False  True  True  True  True  True  True
  True  True  True  True]


In [46]:
b1= b[b_new].astype(float)  # 只將b_new True的才去做astype

In [47]:
print(b1)
print(b1.size)

[146.95  78.   149.05  80.65  23.9   38.   110.    16.7   70.75  95.9
  28.8   27.84  21.98  85.85 176.35   4.13  31.46   4.77  24.9   19.46
   6.2   12.49   9.99  10.3   55.45   6.64  17.41  11.32  37.16  50.55
  80.05   5.81   8.57   9.69  36.37  58.6    6.76  21.81]
38


# Comparisons, Masks, and Boolean Logic

## Comparison Operators as ufuncs
The result of these comparison operators is always an array with a Boolean data type. All six of the standard comparison operations are available:

In [48]:
x = np.array([1, 2, 3, 4, 5])

In [49]:
x < 3  # less than

array([ True,  True, False, False, False])

In [50]:
x > 3  # greater than

array([False, False, False,  True,  True])

In [56]:
x <= 3  # less than or equal

array([ True,  True,  True, False, False])

In [51]:
x >= 3  # greater than or equal

array([False, False,  True,  True,  True])

In [52]:
x != 3  # not equal

array([ True,  True, False,  True,  True])

In [59]:
x == 3  # equal

array([False, False,  True, False, False])

In [53]:
(2 * x) == (x ** 2)

array([False,  True, False, False, False])

## Working with Boolean Arrays

Given a Boolean array, there are a host of useful operations you can do.
We'll work with ``x``, the two-dimensional array we created earlier.

In [54]:
x=np.arange(12).reshape(3,4)
print(x)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [55]:
# how many values less than 6?
np.count_nonzero(x < 6)

6

In [57]:
# how many values less than 6 in each row?
np.sum(x<6 , axis=1)

array([4, 2, 0])

In [58]:
# are there any values greater than 8?
np.any(x > 8)

True

In [59]:
# are all values less than 12?
np.all(x < 12)

True

In [60]:
np.where(x==7)

(array([1], dtype=int64), array([3], dtype=int64))

## 當日股價分析

In [61]:
dayprice = np.array(stockdata.收盤價)

In [62]:
dayprice_yes=(dayprice!='--')
print(dayprice_yes[0:40])

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True False  True  True  True  True  True  True  True  True  True
  True  True  True  True  True False  True  True  True  True  True  True
  True  True  True  True]


In [63]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '1,160.00'

In [64]:
np.where(dayprice=='1,160.00')

(array([327], dtype=int64),)

In [65]:
dayprice[327]

'1,160.00'

In [66]:
dayprice[327]='1160'

In [67]:
dayprice[327]

'1160'

In [68]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '1,340.00'

In [69]:
np.where(dayprice=='1,340.00')

(array([419], dtype=int64),)

In [71]:
dayprice[419]='1340'

In [72]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '1,150.00'

In [73]:
np.where(dayprice=='1,150.00')

(array([555], dtype=int64),)

In [74]:
dayprice[555]='1150'

In [75]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '2,565.00'

In [76]:
np.where(dayprice=='2,565.00')

(array([721], dtype=int64),)

In [77]:
dayprice[721]='2565'

In [78]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '1,545.00'

In [81]:
np.where(dayprice=='1,545.00')

(array([802], dtype=int64),)

In [82]:
dayprice[802]='1545'

In [83]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '1,115.00'

In [84]:
np.where(dayprice=='1,115.00')

(array([815], dtype=int64),)

In [85]:
dayprice[815]='1115'

In [86]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '4,200.00'

In [87]:
np.where(dayprice=='4,200.00')

(array([836], dtype=int64),)

In [88]:
dayprice[836]='4200'

In [89]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '1,050.00'

In [90]:
np.where(dayprice=='1,050.00')

(array([899], dtype=int64),)

In [91]:
dayprice[899]='1050'

In [92]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '2,390.00'

In [93]:
np.where(dayprice=='2,390.00')

(array([938], dtype=int64),)

In [94]:
dayprice[938]='2390'

In [95]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '1,645.00'

In [96]:
np.where(dayprice=='1,645.00')

(array([1026], dtype=int64),)

In [97]:
dayprice[1026]='1645'

In [98]:
dayprice= dayprice[dayprice_yes].astype(float)

ValueError: could not convert string to float: '2,445.00'

In [99]:
np.where(dayprice=='2,445.00')

(array([1076], dtype=int64),)

In [100]:
dayprice[1076]='2445'

In [101]:
dayprice= dayprice[dayprice_yes].astype(float)

In [102]:
print("Mean 當日收盤價:", dayprice.mean())
print("Standard 當日收盤價:", dayprice.std())
print("Minimum 當日收盤價:    ", dayprice.min())
print("Maximum 當日收盤價:    ", dayprice.max())
print("25th percentile:   ", np.percentile(dayprice, 25))
print("Median:            ", np.median(dayprice))
print("75th percentile:   ", np.percentile(dayprice, 75))

Mean 當日收盤價: 85.83610057708162
Standard 當日收盤價: 215.51474451973013
Minimum 當日收盤價:     1.17
Maximum 當日收盤價:     4200.0
25th percentile:    19.75
Median:             37.55
75th percentile:    76.1


## 作業二(Due 2024/03/26): 請用 "date = "2024/03/11" 分析股市的開盤價和收盤價的平均值和標準差。