# Email Tracking Data

When emails go out, we do 2 things to the outgoing message.  We add a hidden tracking pixel to the bottom of the HTML and modify the links in the message.  This allows use to track if a contact open the email (pixel) and what links they clicked.

When the pixel/link request comes in, we know the IP address, the User Agent along with who the contact is and what link they are asking for.  The User Agent is a string supplied by the browser that helps identify the browser type, i.e. iPhone, Chrome on Windows OS, etc.

## Motivation

We are starting to see more and more requests from Bots.  We still get requests generated from the actual contacts, but the Bot activity is now to the point where it is significantly skewing our tracking data.  

In the past, Bot detection was straight forward allowing use discard data based on a set of IP ranges and/or User Agent strings.  But recently, the Bots have become smarter making detection a major problem.  

Most if not all of the BOTs are not malicious. Most are from anti-phishing/malware devices there to protect the contact from harm.  But to catch phishing campaigns, it is important that these protective Bots not be easily detected so that the malicious actors can't evade the service.

## Sessionization
Sessionization is a way of grouping clusters of requests together.  If you open a message and click on 3 links on your iPhone, that should be considered 1 session.  If you open the same email at home on your Mac, these requests should be seen as a different session.  If you only use your phone to open the message, but read the message once in the morning and once in the afternoon, these 2 periods of time should be different sessions.  So a new session is defined here by not having more then a 120 second gap from the last request.  If you click 2 links in a message and click a 3rd link 5 mins later, this will counts as 2 session.

## Data Columns

- SessionID            UUID   An identifier that groups requests into sessions
- SessionDate          UUID   The first request date of the session
- SessionDuration      int64  How many seconds the session lasted
- RequestCount         int64  How many request were in the session
- CompanyCount**       int64  How many different companies were the messages requested for in a session 
- IPCount**            int64  How many different IP Address were there in a session
- IP3OctectCount**     int64  How many IP Address that shared the first 3 Octects were in the session
- UniqueMessageCount** int64  How many messages (SendIDs) were seen in the session
- UserAgentCount       int64  How many different User Agents string were seen in the session
- OpenCount            int64  How many times the pixel image was requests in the session
- ClickCount           int64  How many times a link was click in the session (same link or different links)
- UniqueLinkCount      int64  How many different unique links were clicked in the session

** Only applies to some sessionization approaches

## Sessionization Approaches

### There are 3 approaches to sessionization.  
- IP and SendID - We can group session by grouping on both the IP address and SendID.  i.e. the same IP Address made requests on the same message
- IP Only - We group on just the IP so it can include requests from different messages
- SendID Only - We group on just the SendID so the session can have requests from different IP Addresses

## NOTE:  

The sessionization is done in SQL for now.  The plan will be to move the obfuscated data here and use Python window functions to do the sessionization