* The internet is used by billions of individuals in this developing world. Major problems occur everyday web pages, like security concerns, Response time and restricting sensitive content to the users.
* Because so many people are using the internet at once, there is a rise in the intensity of traffic on the link that carries data between servers and clients. As a result, the average response time, or the amount of time it takes from the browser to request an object to receiving it, increases and will likely be in the minutes range, which is undesirable to some users. A Proxy Server can be used to implement Caching, reducing the amount of traffic and the delay, to solve this problem. resulting in a quicker response time. Through the use of CDN and the localization of much of the traffic, web caches are becoming an increasingly significant part of the internet.
* Using a proxy server, one can also implement parental control (blocking sensitive websites).
KEYWORDS: Blocking websites, Caching, Traffic Intensity, Response Time, Delay.
- Proxy servers are typically employed as a gateway between a user and the internet. Implemented between online visitors' browsers and the websites they visit. A computer uses an IP address to connect to the internet. A proxy server is comparable to an internet-connected machine with a unique IP address. As a result, it aids in preventing online intruders from accessing a private network.
- Network Address Translation, often known as NAT, is frequently combined with proxies to mask the client's IP address from the distant server.
Application Level Proxy: A proxy that only allows access to a certain type of material, such as an HTTP Web Proxy or an FTP Proxy. They are committed to completing a particular task. Various applications can be supported by Circuit Level Proxies.
Forward Proxies are proxies located near the client which conceals details of the client from the server and the remote server will not know the true identity of the requesting client whereas Reverse proxies, on the other hand, are proxies that are located close to the server but conceal information about the server from clients. This prevents the remote server from knowing the genuine identity of the client making the request.
These are the anonymous proxy servers that are accessible online and help to conceal users' identity.
A Web Caching Proxy Server keeps copies of previously accessed requested objects in this storage, which is part of the server's own disc storage. All HTTP queries can be set up in the user's web browser to go through the Web Proxy first. The procedures that follow in order to complete the transmission take place once these requests are directed to the web proxy server:
-
The browser connects over TCP to the Web cache and issues an HTTP request to the Web cache for the object.
-
Web caching checks to see if a duplicate of the object is locally saved. If a copy is present, the Web cache sends the item to the client browser as part of an HTTP response message.
-
The proxy initiates a TCP connection to the origin server if the web caching proxy does not already hold the object. The Web cache then uses the proxy-to-server TCP connection to send an HTTP request for the object. The origin server delivers the object along with an HTTP response to the Web cache after receiving this request.
-
After receiving the item, the web caching proxy makes a copy and stores it locally before sending a copy and an HTTP response message to the client browser via the already established TCP connection between cache and client.
Multiple clients using the proxy maximises the proxy cache's benefits, allowing other clients to access pages that have been cached by one client while experiencing dramatically reduced response times.
Web proxies are highly helpful in Private Networks, such as those found in Institutions, to prohibit users from accessing specific websites or even for parental control because they can give Access Control. A web proxy can block websites on a client's request if it is given a list of websites to ban. When a client request is examined, the proxy checks the received destination URL address against the list of prohibited websites and refuses access if they match.
When used in conjunction with NAT, web proxies hide the client's IP address by changing it to an external IP and making requests on their behalf.
SSL or TLS are used for HTTPS connections. In this case, the client and server exchange encrypted data. To ensure secure transactions, this is widely employed in the financial sector and is currently present practically everywhere.
However, this creates a problem for proxy servers because they have access to all the information or services that clients have requested to the server and there is no secure channel of communication between the two. Since these proxy servers cache all the recently accessed web pages and their traffic, there is a higher risk of a Man In The Middle Attack.
However, these proxy servers will be unable to block or cache traffic of a certain kind known as HTTPS traffic.
Here, the proxy creates a secure connection between the client and the server instead of caching the objects. Implementation of this uses HTTP CONNECT Tunneling.
A web page can be requested from the browser after the Proxy server programme has been launched from a cmd prompt. Utilizing the IP address and port number, the requests are routed to the proxy server (to which the proxy server programme is listening to).
The localhost can be changed to the IP address of the machine on which the proxy server code is executing if you want to utilise the proxy server with a browser and proxy on separate computers. Additionally, the port number must match the one that the proxy server is listening on.
IP Address used here: localhost
Port number: 6969
Traffic from the client is transmitted to the proxy server, which then requests the requested data from the distant server on the client's behalf. The required data is subsequently forwarded to the client via the proxy server. This is helpful if the administrator wants to specify how the user should interpret the data. Clients won't be able to access particular websites, for instance, if the proxy server blocks them dynamically. The client can specify the website url to block, and access to that website will subsequently be limited. In the case of parental control, this aids in limiting the content that users or kids can see.
Implemented using java its TCP socket libraries. Here, Firefox was used to set up all of its traffic to the specified port (6969) and IP address(localhost), which were then used in the proxy configuration.
Main components of implementation :-
-
Proxy class
-
RequestHandler class
In order to receive incoming socket connections from the client, the Proxy Class produces a Server Socket. As the server must serve numerous clients at once, the implementation ought to be multithreaded. As a result, the Proxy accepts the socket connection as it happens and starts a new thread to handle the request (RequestHandler). Multiple clients can have their requests served simultaneously thanks to the server's ability to accept new socket connections before the request has been entirely processed.
Additionally, blocking and caching functions are implemented in this case using a proxy server. It caches websites requested by clients and blocks specific websites listed in the block sites file in the directory.
It is suggested to save references to currently banned and cached websites in a data structure with a constant order search time because response time is crucial for the proxy server (Hashmap). If the file is not in the cache, there is very little overhead, and if the file is located in the cache and saved in a directory, there is an improvement in performance.
It services the requests that come through to the proxy.
-
HTTP GET requests.
-
HTTP GET requests for files contained in the cache.
-
HTTPS CONNECT Requests.
Standard request made when client tries to load a webpage.
-
Parse out the URL associated with the request.
-
Create a HTTP connection to this URL.
-
Echo the client’s GET request to the remote server.
-
Echo the server’s response back to the client and save a copy of file into the directory as proxy’s cache.
Typical requests made by clients, in this case, the file is contained in the proxy’s cache.
-
Parse out the URL associated with the request.
-
Hash the URL and use this as the key for the HashMap data structure.
-
Open file found to read.
-
Echo the contents of the file back to the client.
-
Close file.
Since the CONNECT request uses ordinary HTTP, it is not encrypted. It includes the client's HTTPS request's destination address, which can be accessed through a proxy. Tunneling over HTTP.
-
Client sends CONNECT Request.
-
The proxy extracts the destination URL.
-
Proxy establishes a connection using a normal socket to the remote server indicated by the URL.
-
If the proxy is successfully constructed, it notifies the client with a "Connection Established" (200) response, allowing the client to transfer the encrypted data to the proxy.
-
The proxy then transmits any data that is sent or received from a client to a distant server or vice versa.
Since all of this data will be encrypted, the proxy is unable to cache or even decrypt it.
-
The proxy server can cache HTTP requests to speed up access to previously accessed web pages and reduce waiting times.
-
Access to HTTP websites may be blocked or limited.
-
CONNECT Tunneling, which offers secure connections and data encryption, allows HTTPS connections even if the server is an HTTP proxy server (SSL). Data security is increased, guarding against Man in the Middle attacks.
-
Offers safer Internet browsing and improved network performance.
-
Because the connection is made over CONNECT and the data is encrypted, HTTPS requests cannot be cached.
-
Users cannot block or limit access to HTTPS sites.
-
HTTP queries are not secure because they are not encrypted.
-
Adequate security precautions must be performed before transmission because using a proxy server is often like allowing a third party to access data.
• Creates cached_sites.txt and block_sites.txt as it’s not found in the directory