How do you compare Perceptron and Bachpropgation technique? please explain me as i don't have any knowledge about it.
In general, Backpropagation is a fundamental algorithm in perceptrons and other artificial neural networks (ANN), so perhaps we can't say the Perceptron and Backpropagation technique are comparable. But I'll try my best to explain them.
The perceptron can approximate a complex function between the input and output. For example, a perceptron can be trained to identify hand-written numbers, and the input will be the value of each pixel of the photo with the output being one of 0-9, and what happened between them is clearly quite complicated. It is based on the idea of an artificial neuron, which takes in an $N$-dimension input vector $\{ x_0, x_1,...,x_{N-1} \}$. It then performs a weighted sum with a bias on the input, gaining $x_0 w_0 + x_1 w_1 +...+x_{N-1} w_{N-1} + b$. The sum is then fed to an activation function, which is usually non-linear, to get the final output $y$. To get an $M$-dimension output, we can build a layer made up of $M$ separate neurons and feed the same $N$-dimension input to each of them, then collect each one's output to form a new vector $\{ y_0, y_1,..., y_{M-1} \}$. The input, output, and the neuron layer form a single-layer perceptron. By combining several layers of neurons together, where one layer's output is the next layer's input, we get a multi-layer perceptron. This also explains why the activation function is usually non-linear - if we use a linear function, and as the biased weighted sum part is also linear, the whole multi-layer perceptron will just perform a linear computation to the input which is the same as a simple biased weighted sum. With the power of non-linear activation functions, instead, we can approximate complex functions by multi-layer perceptrons.
So how can we adjust the weights and biases in the neurons to get the approximation of the expected function? We first need some - usually a lot of - existing inputs and outputs to train it by minimizing the error between the expected and actual output. This is like you gonna show the correct way to do something to teach others. The method is called the Backward Stage, opposite to the Forward Stage which begins from the input, all the way through the perceptron's weighted sums and activation functions, to the output. This part is more difficult and requires some calculus so get prepared:
Given a $L$-layer perceptron, let $C$ be the error $z^l_j$ be the $j$ weighted sum of the $l$ layer, $w^l_{jk}$ and $b^l_j$ be the weights and bias, and $a^l_j = f^l(z^l_j)$ be the corresponding output. We get $z^l_j = \sum ^{m^l} _{k=1} w^l_{jk} a^{l-1}_k + b^l_j$. To minimize $C$, we repeatedly use gradient descent to update each weight and bias:
$w = w - \epsilon \frac{\partial C}{\partial w}$, $b = b - \epsilon\frac{\partial C}{\partial b}$, with $\epsilon$ being the learning rate.
Now the task is to find the partial derivate, or local gradient for each weight bias by using the chain rule:
$\frac{\partial C}{\partial w^l_{jk}} = \frac{\partial C}{\partial z^l_j} \frac{\partial z^l_j}{\partial w^l_{jk}} =  \frac{\partial C}{\partial z^l_j} a^{l-1}_k$
$\frac{\partial C}{\partial b^l_j} = \frac{\partial C}{\partial z^l_j} \frac{\partial z^l_j}{\partial b^l_j}} =  \frac{\partial C}{\partial z^l_j}$
and:
$\frac{\partial C}{\partial z^l_j} = \frac{\partial C}{\partial a^l_j} \frac{\partial a^l_j}{\partial z^l_j} = \frac{\partial C}{\partial a^l_j} f^{l}'(z^l_j)$
$\frac{\partial C}{\partial a^l_j} = \sum _{k=1}{m^{l+1}} w^{l+1}_{jk \frac{\partial C}{\partial z^{l+1}_k$
So we can first run a Forward Stage to get the $a$ and $z$. Next, we begin with the final outputs $a^{L-1}_j$ at layer $L-1$, calculate $\frac{\partial C}{\partial a^{L-1}_j$ and $\frac{\partial C}{\partial z^{L-1}_j$, then use them to calculate $\frac{\partial C}{\partial a^{L-2}_j}$ and $\frac{\partial C}{\partial z^{L-2}_j}$, and then repeat this all the way back to the first layer. With these two partial derivatives of each layer determined, we can calculate the local gradient for each weight bias of each layer. By repeating Forward Stage - Backward Stage for many times, the error can be minimized. Since the calculation of local gradients goes from the last layer to the first one, we call it the Backpropagation.
I hope this can help you better understand these concepts.