# 1102_DS_Lab3  資料聚合及遮罩應用於股價分析

# Aggregations(聚合函數): Min, Max, and Everything In Between
Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question. Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the "typical" values in a dataset, but other aggregates are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.).
** Most texts are released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT).

In [17]:
import numpy as np
A = np.random.random(100)
A

array([0.36687156, 0.56343383, 0.5319684 , 0.70289416, 0.63991789,
       0.93689598, 0.19281939, 0.12220232, 0.51933253, 0.55333325,
       0.580578  , 0.06039158, 0.09125848, 0.11088746, 0.63931121,
       0.96033583, 0.73850135, 0.59221175, 0.26250417, 0.04786638,
       0.96468634, 0.5772507 , 0.85177714, 0.34719934, 0.36907193,
       0.9930666 , 0.19561913, 0.06121447, 0.6112453 , 0.3379239 ,
       0.0422601 , 0.22893411, 0.66597758, 0.34065494, 0.47626283,
       0.94933225, 0.82501025, 0.78917022, 0.87301613, 0.94137133,
       0.3918407 , 0.39530503, 0.69463332, 0.80846314, 0.37021874,
       0.56558278, 0.81635591, 0.30668005, 0.42334652, 0.41257283,
       0.27633536, 0.06628478, 0.00425122, 0.64836145, 0.9848625 ,
       0.83166117, 0.39919428, 0.9602561 , 0.5482615 , 0.25888262,
       0.01940508, 0.52434502, 0.75039828, 0.30483573, 0.77474266,
       0.80899246, 0.43326779, 0.37551494, 0.84774832, 0.95814045,
       0.53037815, 0.79290932, 0.84071495, 0.40789278, 0.41989

In [7]:
np.sum(A)
%timeit sum(A)
%timeit np.sum(A)

12.7 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2.84 µs ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [8]:
big_array = np.random.rand(1000)
%timeit sum(big_array)
%timeit np.sum(big_array)

121 µs ± 6.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3.56 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [73]:
np.min(big_array), np.max(big_array)  # 比較快

(0.0006523407429654959, 0.997605614414985)

In [9]:
M = np.random.random((3, 4))
print(M)

[[0.84674601 0.20116596 0.19303178 0.60834873]
 [0.0396101  0.51942961 0.69508997 0.89897813]
 [0.87636078 0.32840723 0.32519415 0.56437392]]


In [13]:
print(sum(sum(M)))

6.096736375446764


In [11]:
M.sum()

6.096736375446764

## Aggregation functions take an additional argument specifying the axis along which the aggregate is computed. 
For example, we can find the minimum value within each column by specifying axis=0. The axis keyword specifies the dimension of the array that will be collapsed, rather than the dimension that will be returned. So specifying axis=0 means that the first axis will be collapsed: for two-dimensional arrays, this means that values within each column will be aggregated.

![AXIS範例](axis.jpg)

In [81]:
M.sum(axis=0)

array([1.92833523, 1.1206888 , 1.28624036, 0.90897006])

In [18]:
M.sum(axis=1)

array([1.84929249, 2.15310781, 2.09433608])

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |



In [19]:
import pandas as pd
import numpy as np
import requests

In [25]:
date = "20220314"
url = f'https://www.twse.com.tw/exchangeReport/MI_INDEX?response=json&date={date}&type=ALLBUT0999'
response = requests.get(url)
response_json = response.json()
stockdata = pd.DataFrame(response_json['data9'], columns=response_json['fields9'])
origin = stockdata.copy()
stockdata.head()

Unnamed: 0,證券代號,證券名稱,成交股數,成交筆數,成交金額,開盤價,最高價,最低價,收盤價,漲跌(+/-),漲跌價差,最後揭示買價,最後揭示買量,最後揭示賣價,最後揭示賣量,本益比
0,50,元大台灣50,9183817,14005,1235380453,135.0,135.35,134.0,134.4,<p> </p>,0.0,134.4,9,134.45,138,0.0
1,51,元大中型100,49304,85,2907160,59.15,59.25,58.7,59.0,<p> </p>,0.0,58.9,1,59.1,1,0.0
2,52,富邦科技,357322,193,44264763,124.85,124.85,123.55,123.8,<p style= color:green>-</p>,0.45,123.75,4,123.9,1,0.0
3,53,元大電子,8270,13,534992,64.9,64.9,64.55,64.65,<p style= color:green>-</p>,0.25,64.55,2,64.6,1,0.0
4,54,元大台商50,3000,3,89760,29.96,30.0,29.8,29.8,<p style= color:green>-</p>,0.2,29.73,1,29.82,1,0.0


In [43]:
dayprice = np.array(stockdata['收盤價'])   #也可以這樣寫  dayprice = np.array(stockdata.收盤價)
print(dayprice)

['134.40' '59.00' '123.80' ... '14.10' '25.00' '107.50']


In [26]:
a = np.array(stockdata.收盤價[0:9])
print(a)

['134.40' '59.00' '123.80' '64.65' '29.80' '24.94' '34.13' '93.70' '20.16']


In [29]:
print(len(a))
print(type(a))

9
<class 'numpy.ndarray'>
64.95333333333333


In [28]:
a= a.astype(np.float)
print(a)

[134.4   59.   123.8   64.65  29.8   24.94  34.13  93.7   20.16]


In [30]:
print(a.mean())

64.95333333333333


In [41]:
print("Mean 收盤價:", a.mean())
print("Standard 收盤價:", a.std())
print("Minimum 收盤價:    ", a.min())
print("Maximum 收盤價:    ", a.max())
print("25th percentile:   ", np.percentile(a, 25))
print("Median:            ",  np.median(a))
print("75th percentile:   ", np.percentile(a, 75))

Mean 收盤價: 64.95333333333333
Standard 收盤價: 40.744487016309606
Minimum 收盤價:     20.16
Maximum 收盤價:     134.4
25th percentile:    29.8
Median:             59.0
75th percentile:    93.7


In [44]:
print("Mean 收盤價:", dayprice.mean())
print("Standard 收盤價:", dayprice.std())
print("Minimum 收盤價:    ", dayprice.min())
print("Maximum 收盤價:    ", dayprice.max())
print("25th percentile:   ", dayprice.percentile(heights, 25))
print("Median:            ", dayprice.median(heights))
print("75th percentile:   ", dayprice.percentile(heights, 75))

TypeError: unsupported operand type(s) for /: 'str' and 'int'

In [45]:
print(dayprice[0:40])

['134.40' '59.00' '123.80' '64.65' '29.80' '24.94' '34.13' '93.70' '20.16'
 '64.15' '87.60' '33.96' '32.43' '26.41' '77.30' '--' '130.15' '5.35'
 '40.35' '4.57' '26.24' '22.21' '7.76' '16.00' '9.58' '13.94' '24.58'
 '10.87' '19.50' '16.68' '--' '24.05' '37.01' '62.80' '7.11' '14.57'
 '9.31' '27.50' '40.15' '8.67']


In [46]:
b= np.array(stockdata.收盤價[0:40])

In [47]:
print(b)
print(b.size)

['134.40' '59.00' '123.80' '64.65' '29.80' '24.94' '34.13' '93.70' '20.16'
 '64.15' '87.60' '33.96' '32.43' '26.41' '77.30' '--' '130.15' '5.35'
 '40.35' '4.57' '26.24' '22.21' '7.76' '16.00' '9.58' '13.94' '24.58'
 '10.87' '19.50' '16.68' '--' '24.05' '37.01' '62.80' '7.11' '14.57'
 '9.31' '27.50' '40.15' '8.67']
40


In [48]:
b= b.astype(np.float)

ValueError: could not convert string to float: '--'

In [50]:
b_new=(b!='--')
print(b_new)

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True False  True  True  True  True  True  True  True  True
  True  True  True  True  True  True False  True  True  True  True  True
  True  True  True  True]


In [51]:
b1= b[b_new].astype(np.float)

In [52]:
print(b1)
print(b1.size)

[134.4   59.   123.8   64.65  29.8   24.94  34.13  93.7   20.16  64.15
  87.6   33.96  32.43  26.41  77.3  130.15   5.35  40.35   4.57  26.24
  22.21   7.76  16.     9.58  13.94  24.58  10.87  19.5   16.68  24.05
  37.01  62.8    7.11  14.57   9.31  27.5   40.15   8.67]
38


# Comparisons, Masks, and Boolean Logic

## Comparison Operators as ufuncs
The result of these comparison operators is always an array with a Boolean data type. All six of the standard comparison operations are available:

In [53]:
x = np.array([1, 2, 3, 4, 5])

In [54]:
x < 3  # less than

array([ True,  True, False, False, False])

In [55]:
x > 3  # greater than

array([False, False, False,  True,  True])

In [56]:
x <= 3  # less than or equal

array([ True,  True,  True, False, False])

In [57]:
x >= 3  # greater than or equal

array([False, False,  True,  True,  True])

In [58]:
x != 3  # not equal

array([ True,  True, False,  True,  True])

In [59]:
x == 3  # equal

array([False, False,  True, False, False])

In [60]:
(2 * x) == (x ** 2)

array([False,  True, False, False, False])

## Working with Boolean Arrays

Given a Boolean array, there are a host of useful operations you can do.
We'll work with ``x``, the two-dimensional array we created earlier.

In [61]:
x=np.arange(12).reshape(3,4)
print(x)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [62]:
# how many values less than 6?
np.count_nonzero(x < 6)

6

In [110]:
# how many values less than 6 in each row?
np.sum(x<6 , axis=1)

array([4, 2, 0])

In [111]:
# are there any values greater than 8?
np.any(x > 8)

True

In [64]:
# are all values less than 12?
np.all(x < 12)

True

In [63]:
np.where(x==7)

(array([1], dtype=int64), array([3], dtype=int64))

## 當日股價分析

In [65]:
dayprice = np.array(stockdata.收盤價)

In [66]:
dayprice_yes=(dayprice!='--')
print(dayprice_yes[0:40])

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True False  True  True  True  True  True  True  True  True
  True  True  True  True  True  True False  True  True  True  True  True
  True  True  True  True]


In [67]:
dayprice= dayprice[dayprice_yes].astype(np.float)

ValueError: could not convert string to float: '2,015.00'

In [68]:
np.where(dayprice=='2,015.00')

(array([696], dtype=int64),)

In [70]:
dayprice[696]

'2,015.00'

In [71]:
dayprice[696]='2015'

In [72]:
dayprice[696]

'2015'

In [80]:
dayprice= dayprice[dayprice_yes].astype(np.float)

ValueError: could not convert string to float: '1,805.00'

In [101]:
np.where(dayprice=='1,270.00')

(array([1053], dtype=int64),)

In [103]:
dayprice[1053]='1270'

In [104]:
dayprice= dayprice[dayprice_yes].astype(np.float)

In [105]:
print("Mean 當日收盤價:", dayprice.mean())
print("Standard 當日收盤價:", dayprice.std())
print("Minimum 當日收盤價:    ", dayprice.min())
print("Maximum 當日收盤價:    ", dayprice.max())
print("25th percentile:   ", np.percentile(dayprice, 25))
print("Median:            ", np.median(dayprice))
print("75th percentile:   ", np.percentile(dayprice, 75))

Mean 當日收盤價: 71.13920245398774
Standard 當日收盤價: 170.09056310321776
Minimum 當日收盤價:     1.25
Maximum 當日收盤價:     3075.0
25th percentile:    19.6
Median:             34.0
75th percentile:    64.0


## 作業二(Due 3/29): 請用 "date = "2022/03/15" 分析股市的開盤價和收盤價的平均值和標準差。