-
Notifications
You must be signed in to change notification settings - Fork 0
/
notes-with-devika.txt
55 lines (45 loc) · 1.69 KB
/
notes-with-devika.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
already some preprocessing done; something to keep in mind
something to keep in mind
shift, but there is some overlap
but that overlap happens by definition
depending on where you are in supersymmetric space, the overlap can be really bad
significance: #(signal)/sqrt(#background)
something to do:
do cut and count and see if the line is to the right of the point (mva_bdt_rejBvsS;1)
separation looks too nice for training data
only using 1/20th of the dat
experimetn:
progressively double the size of your data set
100, 200, 400, 800, ....
see what the trend is in terms of overtraining
the more points you have, the more attuned you are to those specific data points
change balance between train and test
current config: 50% training/50% test
another possible config: 80% training/20% on test
todo:
1000 data points and 10000 data points
stick with 50/50
run PCA and get two most important components
email plot to devika
color points so background and signal are distinguished
todo:
number of trees should be about sqrt of points
show on same plot:
where are humans at this?
and how well is bdt doing
this approach is very similar to medical decision making
suggested future work:
change bdt computations
not worried about confusion matrix
plot significance vs. number of trees
normalize all variables?
look into documentation and c++ code to see if the normalization is already being done
0. write
1. compare cut and count
100t350_8TeV-51.txt: Efficiency = 3.17%
100t400_8TeV-52.txt: Efficiency = 3.83%
100t400_8TeV-55.txt: Efficiency = 3.95%
100t450_8TeV-51.txt: Efficiency = 3.34%
ttj006f38-9.txt: Rejection = 98.26%
//2. adjust number of trees
3. other susy points