forked from nlintz/TensorFlow-Tutorials
-
Notifications
You must be signed in to change notification settings - Fork 0
/
logistic_regression
122 lines (80 loc) · 5.28 KB
/
logistic_regression
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
//=== https://en.wikipedia.org/wiki/Logistic_distribution
, the logistic distribution is a continuous probability distribution.
Its cumulative distribution function is the logistic function,
which appears in logistic regression and feedforward neural networks.
It resembles the normal distribution in shape but has heavier tails (higher kurtosis)
The Tukey lambda distribution can be considered a generalization of the logistic distribution
since it adds a shape parameter, λ (the Tukey distribution is logistic when λ is zero).
The logistic distribution arises as limit distribution of
a finite-velocity damped random motion described by a telegraph process in which
the random times between consecutive velocity changes
have independent exponential distributions with linearly increasing parameters.[4]
the Bernoulli distribution, named after Swiss scientist Jacob Bernoulli,[1]
is the probability distribution of a random variable
which takes the value 1 with success probability of p and
the value 0 with failure probability of q=1-p.
It can be used to represent a coin toss where 1 and 0 would represent "head" and "tail"
(or vice versa), respectively.
In particular, unfair coins would have p\neq 0.5.
the Bernoulli distribution is a binomial distribution where n=1.
If {\displaystyle X_{1},\dots ,X_{n}} X_{1},\dots ,X_{n} are
independent, identically distributed (i.i.d.) random variables,
all Bernoulli distributed with success probability p, then
Y=\sum _{k=1}^{n}X_{k}\ --> binomial(n,p) (binomial distribution).
//=== https://en.wikipedia.org/wiki/Logistic_regression
In statistics, logistic regression, or logit regression, or logit model is
a regression model where the dependent variable (DV) is categorical.
Logistic regression was developed by statistician David Cox in 1958.
* it is not a classification method?? --> regression method ?
It could be called a qualitative response/discrete choice model in the terminology of economics.
Logistic regression measures the relationship between
the categorical dependent variable and one or more independent variables
by estimating probabilities using a logistic function, which is the cumulative logistic distribution.
*** "Logistic regression" assumes a standard logistic distribution of errors and
"Probit regression" assumes a standard normal distribution of errors
Logistic regression can be seen as a special case of the generalized linear model
but, The model of logistic regression is based on quite different assumptions
(about the relationship between dependent and independent variables)
from those of linear regression
the key differences of these two models can be seen in the following two features of logistic regression.
First, the conditional distribution y|x is a Bernoulli distribution rather than a Gaussian distribution, because the dependent variable is binary.
Second, the predicted values are probabilities and are therefore restricted to (0,1)
through the logistic distribution function
because logistic regression predicts the probability of particular outcomes.
//=== "latent"
The error term \epsilon is not observed, and so the y\prime is also an unobservable,
hence termed "latent". (The observed data are values of y and x)
Unlike ordinary regression, however, the \beta parameters cannot be expressed
by any direct formula of the y and x values in the observed data.
y=\beta_0 + beta_1*x
y'= y + \epsilon = b0 + b1*x + err
//=== logistic function (sigma)
(-inf, inf) --> [0,1]
sigma(t)= 1/(1+ exp(-t))
t= b0+ b1*x
* logistic function can now be written as:
F(x)= 1/[1+exp-(\beta _{0}+\beta _{1}x + e)]
* logit function(log odds) (inverse of logistic function)
g(F)= ln[F/(1-F)] = b0 + b1*x + e'
In mathematical modelling and statistical modelling,
there are dependent and independent variables.
The models investigate how the former depend on the latter.
The dependent variables represent the output or outcome.
The independent variables represent inputs or causes.
"Explanatory variable" is preferred by some authors over "independent variable"
when the quantities treated as "independent variables" may not be statistically independent.
"Explained variable" is preferred by some authors over "dependent variable"
when the quantities treated as "dependent variables" may not be statistically dependent
If the dependent variable is referred to as an "explained variable"
then the term "predictor variable" is preferred by some authors for the independent variable.
*** controlled var, extraneous var
the independent variable that will be kept constant or monitored to try to minimise its effect on the experiment.
Such variables may be designated as either a "controlled variable", "control variable", or "extraneous variable".
A variable is extraneous only when it can be assumed (or shown) to influence the dependent variable.
If included in a regression, it can improve the fit of the model.
If it is excluded from the regression and
if it has a non-zero covariance with one or more of the independent variables of interest,
its omission will bias the regression's result for the effect of that independent variable of interest.
-->
This effect is called confounding or omitted variable bias;
in these situations, design changes and/or statistical control is necessary.