# Notes On Reverse Engineering NKU Course Selection System

These are runnable notes that explain the backend mechanism of the EAMIS system. Due to security reasons, I could not leave actual output during my dev process. You could enter your eamis credentials and run the entire notebook to see actual results.

In [None]:
account = "youraccount" # for example 2911311 
password = "yourpassword" # this will only be used for login

## Prerequisites

This was originally implemented with `selenium` with minimal knowledge of the system. The code was later refactored to use `httpx` and `BeautifulSoup` for better performance and reliability.

In [None]:
import datetime
import hashlib
import json
import re
import time
from pprint import pprint

import hjson
import httpx
from bs4 import BeautifulSoup
from Crypto.Cipher import AES
from Crypto.Util.Padding import pad

In [None]:
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Language": "en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7",
    "Accept-Encoding": "gzip, deflate, br",
    "DNT": "1",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Cache-Control": "max-age=0",
    "Sec-Ch-Ua": '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": '"macOS"',
}
client = httpx.Client()
client.headers.update(headers)


def print_request_info(response: httpx.Response):
    soup = BeautifulSoup(response.content, "lxml")
    print("----------------------------")
    print("URL:", response.url)
    print("Status Code:", response.status_code)
    print("Headers:")
    pprint(dict(response.headers))
    print("Cookies:")
    pprint(dict(client.cookies))
    print("Content:", soup.prettify(), "\n")


def create_timestamp():
    now = datetime.datetime.now()
    timestamp = int(now.timestamp() * 1000)
    return timestamp


eamis_main = httpx.URL("https://eamis.nankai.edu.cn")
login_main = httpx.URL("https://iam.nankai.edu.cn")

## Login

Upon entering `https://eamis.nankai.edu.cn`, website first redirects to `https://eamis.nankai.edu.cn/eams/login.action`. This is HTTP redirection and will not be captured by `httpx` module.

The cookie expires after 12 hours. However in real usage the time is much shorter.

In [None]:
prelogin_response = httpx.get(eamis_main, follow_redirects=False)
# raises CollectionTimeout when not in the same network
print_request_info(prelogin_response)

Then the website redirect 4 times in total to reach the final login page

1. Where does crsf token come from?

    `crsf` cookie is set by server upon entering page `https://eamis.nankai.edu.cn/eams/homeExt.action`. It is a security token to prevent CSRF attacks. The token is stored in a cookie named `JSESSIONID` and is used in subsequent requests to validate the session.
2. Does `request` contain the correct headers and cookies after redirection?
   
   Cookies, yes. But headers are not set correctly because on browsers they are set by js functions.

During the redirection, I believe the server uses JSESSION to store the original session destination. 

In [None]:
prelogin_redirection = client.get(
    eamis_main.join("/eams/homeExt.action"),
    headers=headers,
    follow_redirects=True,
)
for history_response in prelogin_redirection.history:
    print_request_info(history_response)
print_request_info(prelogin_redirection)

Here is the **key api for login**: `"https://iam.nankai.edu.cn/api/v1/login?os=web"`

Whar's interesting is that there exists a field named `feilian`. Does that mean all requests are sent to VPN then redirect to server?

In [None]:
login_response = client.post(
    "https://iam.nankai.edu.cn/api/v1/login?os=web",
    headers={
        # Actually with experiment, only `content-type`, `csrf-token` and `referrer` are needed
        "accept": "*/*",
        "accept-language": "zh-CN",
        "cache-control": "no-cache",
        "content-type": "application/json",
        "csrf-token": client.cookies.get("csrf-token", ""),  # type: ignore
        "pragma": "no-cache",
        "sec-fetch-dest": "empty",
        "sec-fetch-mode": "cors",
        "sec-fetch-site": "same-origin",
        "x-fe-version": "3.0.9.8465",
        "x-version-check": "0",
        "referer": str(prelogin_redirection.url),
    },
    json={
        "login_scene": "feilian",
        "account_type": "userid",
        "account": account,
        "password": encrypt_password(password),
    },
    follow_redirects=True,
)

Possible responses:

```
{"code":10110001,"action":"alert","message":"用户名或密码错误，请再次输入，如确认无误可联系管理员排查。(10110001)"}

{"code":0,"action":"","message":"","data":{"result":"success","next":{"action":"GoToLink","can_skip":false}}}

{"code":40000,"action":"alert","message":"参数错误"}
```

In [None]:
print_request_info(login_response)

Note:

After 3 hours of searching in Rust code, I finally learnt that:
+ Rust `reqwest` library automatically handles `Accept-Encoding` header, which content depends on the features enabled during compilation.
+ If `Accept-Encoding` is set manually and conflicts with the one set by `reqwest`, encoding error occurs.

For password encryption, one can navigate to original site and you will eventually track down to `index.js` file which contains the encryption function. The encryption is done using `CryptoJS` library.

```javascript
function h(e) {
    const t = (r = Number.MAX_SAFE_INTEGER.toString(),
    o().MD5(r).toString()).toString()
        , n = o().SHA1(o().enc.Utf8.parse(t));
    var r;
    return o().AES.encrypt(e, o().enc.Utf8.parse(t), {
        iv: o().enc.Utf8.parse(n.toString(o().enc.Hex)),
        mode: o().mode.CBC,
        padding: o().pad.Pkcs7
    }).ciphertext.toString(o().enc.Hex)
}
```
Which is equivalent to the following Python code:(generated with AI)

In [None]:
def encrypt_password(password):
    # Step 1: Generate key (same as JavaScript)
    max_safe_int = "9007199254740991"
    t = hashlib.md5(max_safe_int.encode()).hexdigest()
    # Step 2: Generate IV hash (same as JavaScript)
    n_hex = hashlib.sha1(t.encode()).hexdigest()
    # Step 3: Convert hex string to bytes as UTF-8 (this is the key part!)
    # JavaScript: CryptoJS.enc.Utf8.parse(n.toString(CryptoJS.enc.Hex))
    iv_bytes = n_hex.encode("utf-8")  # Treat hex string as UTF-8 text
    # Step 4: Prepare for AES
    key_bytes = t.encode("utf-8")  # MD5 hex string as UTF-8 bytes
    iv_for_aes = iv_bytes[:16]  # Take first 16 bytes for AES
    cipher = AES.new(key_bytes[:32], AES.MODE_CBC, iv_for_aes)
    encrypted = cipher.encrypt(pad(password.encode("utf-8"), 16))

    return encrypted.hex()


encrypt_password(password)

If you look close enough, you will find the header `referer` used in login on browser request is not identical to `prelogin_redirection.url` after redirection. 
```
(Request) "https://iam.nankai.edu.cn/login?next=%2Fapi%2Fcas%2Flogin%3Fservice%3Dhttps%253A%252F%252Feamis.nankai.edu.cn%252Feams%252Flogin.action%253Bjsessionid%253D9A9E5358E24CEC4B2AE07D2480BA4965.std7"
(Browser) "https://iam.nankai.edu.cn/login?next=%2Fapi%2Fcas%2Flogin%3Fservice%3Dhttps%253A%252F%252Feamis.nankai.edu.cn%252Feams%252Flogin.action"
```

This might due to server-side logic, javascript behavior, won't affect the login process, but possibly indication that can be used for server side to track down bot. Here are some strategies to be implemented in prodction:
1. Add random delay between requests
2. Request intermediate pages to mimic browser behavior


As of is it a good practice? 

> **Is This Good Practice?**
> 
> Generally, no - but it serves specific purposes:
> Legitimate Uses:
> 
> - Build optimization (smaller bundle size)
> - Lazy loading (better performance)
> - Module compatibility (works in different environments)
> 
> **Questionable Uses:**
> 
> - Security through obscurity (not real security)
> - Making code harder to understand (maintenance nightmare)
> - Avoiding detection (hiding what crypto library is used)


The login api returns a JSON response with `code` and `action` fields. 

TODO: find more about api response

In [None]:
login_status = json.loads(login_response.content)
print("Login Status:", login_status)
if login_status.get("code") == 0:
    print("Login successful!")
else:
    print(login_status.get("message", "Login failed for unknown reason."))

In [None]:
postlogin_response = client.get(
    login_main.join(login_status["data"]["next"]["link"]),
    follow_redirects=True,
)

After login, the response directs to certain page, with another 4 redirections, we reached the final page: `https://eamis.nankai.edu.cn/eams/home.action`.

Once logged in, the session is stored in `JSESSIONID` cookie, which is used for subsequent requests. And `csrfToken` will be off the chart.

One interesting problem is the maximized time allowed between login and action. The cookie expires after 12 hours, but in real usage the time can be found with simple bisecting search. The result is 30 minutes with precision of 2 minutes. This result is confirmed with second tests.

In [None]:
print(postlogin_response.history)
for history_response in postlogin_response.history:
    print_request_info(history_response)
print_request_info(postlogin_response)

## Front End

In eamis system, different menus have different ids and api endpoints. 
For example:
- 我的 界面：`https://eamis.nankai.edu.cn/eams/home!submenus.action?menu.id=&_=1749625169269`
- 主页面：`https://eamis.nankai.edu.cn/eams/home!welcome.action?_=1749625169270`
- 选课界面：`https://eamis.nankai.edu.cn/eams/stdElectCourse.action?_=1749625169272`

Here we skip the request of main page and directly request the course selection page. (Might be implemented in future)

In [None]:
course_select_menu_response = client.get(
    "https://eamis.nankai.edu.cn/eams/stdElectCourse.action",
    headers={
        "Referer": str(postlogin_response.url),
        "X-Requested-With": "XMLHttpRequest",
    },
    params={"_": "1749625169272"},
    follow_redirects=True,
)
print_request_info(course_select_menu_response)

It returns HTML content, which can be parsed with BeautifulSoup to extract *profile* information (Different categories for courses).

In [None]:
soup = BeautifulSoup(course_select_menu_response.content, "lxml")
selection_divs = soup.find_all("div", id=re.compile(r"^electIndexNotice\d+$"))

course_categories: list[dict] = []
for div in selection_divs:
    # Extract title from h3 element
    title_element = div.find("h3")  # type: ignore
    title = title_element.get_text(strip=True) if title_element else None  # type: ignore

    # Extract href from the link
    link_element = div.find("a", href=True)  # type: ignore
    href = link_element["href"] if link_element else None  # type: ignore

    if title and href:
        course_categories.append(
            {
                "title": title,
                "href": href,
                "url": str(eamis_main.join(href)),  # type: ignore
                "id": href.split("=")[-1] if "=" in href else None,  # type: ignore
            }
        )

# Print results
for course_category_example in course_categories:
    print(f"Title: {course_category_example['title']}")
    print(f"Link: {course_category_example['url']}")
    print(f"ID: {course_category_example['id']}")
    print("-" * 50)

Front end is actually more interesting than you might think.

First as you can see all APIs require a UNIX timestamp as a parameter, which is used to prevent caching. This is a common practice in web development to ensure that the client always receives the latest data from the server. But actually the actual time value is not important at all. One can even send requests with old timestamp and it will still work. 

Second, the APIs have `!` and `action` in the URL, which  is not a general web design standard but a specific convention used by the Apache Struts framework. 
- stdElectCourse: This part of the URL maps to a Java class, often called an Action class (e.g., StdElectCourseAction.java). This class is responsible for handling user requests related to course selection.

- .action: This is the default extension that the Struts framework uses to identify requests that it should process.

- ! (The "Bang" Notation): This is the key part. It's a feature in Struts called "Dynamic Method Invocation" (DMI). It allows a single Action class to have multiple methods that can be called directly from a URL.

## Backend

Home Page API for each profile: `https://eamis.nankai.edu.cn/eams/stdElectCourse!defaultPage.action`
Added by `?electionProfile.id={profileID}`

The rest of api calls shall include this as referer header.

In [None]:
course_category_example = course_categories[0]  # Select the first category for demonstration
prelogin_redirection = client.get(
    course_category_example["url"],
    headers={
        "Referer": str(course_select_menu_response.url),
        "X-Requested-With": "XMLHttpRequest",
    },
)
print_request_info(prelogin_redirection)

### Internationalization
These apis are for i18n support which returns a JSON object with key-value pairs for teacher-id, classroom-id, and course-id. Set the bad practice of returning js values aside, the main api actually returns both id and name in Chinese, rendering the i18n support useless.

In [None]:
apis = [
    "https://eamis.nankai.edu.cn/eams/stdElectCourse!classroomI18N.action",
    "https://eamis.nankai.edu.cn/eams/stdElectCourse!courseI18N.action",
    "https://eamis.nankai.edu.cn/eams/stdElectCourse!teacherI18n.action",
]
for api in apis:
    a = client.get(
        api,
        params={"profileId": course_category_example["id"], "lang": "zh"},
        headers={
            "Referer": course_category_example["url"],
            "X-Requested-With": "XMLHttpRequest",
        },
    )
    print_request_info(a)

### Course Info
**Key api response for course info**: `https://eamis.nankai.edu.cn/eams/stdElectCourse!data.action`

@param `electionProfile.id` : Profile ID

@returns JavaScript containing (H)JSON data with course information.(*Although this is bad practice in web design*)

In [None]:
course_info = client.get(
    "https://eamis.nankai.edu.cn/eams/stdElectCourse!data.action",
    params={"profileId": course_category_example["id"]},
    headers={
        "Referer": course_category_example["url"],
        "X-Requested-With": "XMLHttpRequest",
    },
)
print_request_info(course_info)

In [None]:
course_info_parsed = BeautifulSoup(course_info.content, "lxml")


info = (
    course_info_parsed.find("body")
    .find("p")  # type: ignore
    .get_text(strip=True)  # type: ignore
    .split("=", 1)[-1]
    .strip()
)[:-1]
hjson.loads(info)
# with open("course_selection_status.json", "w", encoding="utf-8") as f:
#     json.dump(
#         hjson.loads(info),
#         f,
#         ensure_ascii=False,
#         indent=4,
#     )

This API returns a **HUGE** js dictionary with course information, but it's actually easy to understand. We will take one for example:

```json
{
    "id": 598221, // Course ID, essential for our purpose. Used as parameter in course selection API
    "no": "0053", // Course number, Just for numbering in webview
    "name": "智能软件前沿", // Course name, in Chinese
    "limitCount": 200, // Total student limit for the course
    "planLimitCount": 200, // limit for students in certain plan/major
    "unplanLimitCount": 0, // limit for students not in certain plan/major, actually isn't this simple arithmetic to calculate?
    "code": "INTL0028", // Course code, important for identification
    "credits": 1, // Course credits, extra information but not essential
    "courseId": 20507, // i18n for course name (actually I believe `id` field is more than sufficient)
    "startWeek": 1, // Start week of the course
    "endWeek": 1, // End week of the course (this is a one-week course happening in summer)
    "courseTypeId": 401, // i18n for courseTypeName
    "courseTypeName": "国际学分课程", // Course type name
    "courseTypeCode": "01_09", // idon't know ... but seems unimportant
    "scheduled": true, 
    "hasTextBook": false, // Who cares about textbooks?
    "period": 18, // Total period of the course, but actually it's not endWeek - Start Week + 1...
    "weekHour": 0, // it just looks unimportant...
    "withdrawable": true, // This field I believe indicates if the course can be withdrawn after selection, but aren't all courses withdrawable?
    "langTypeName": "中文", // Language type of the course, not important
    "textbooks": "", // I can't believe there are 2 fields for textbooks...
    "teachers": "朱锦潮", // Teacher name
    "teacherIds": "19005", // i18n for teacher ID
    "campusCode": "04", // I must complain, we only have 2 (actually 3) campuses, but the code is 04 :(
    "campusName": "津南校区", // Campus name
    "midWithdraw": "不能期中退课", // Mid-term withdrawal policy, not important
    "reservedCount": "0", 
    "remark": "", // not important
}
```

```json
{
    "arrangeInfo": [
        {
            "weekDay": 4, // Day of the week
            "weekState": "01000000000000000000000000000000000000000000000000000",
            "startUnit": 7, // Start unit of the course
            "endUnit": 10,  // End unit of the course
            "weekStateDigest": "1",
            "startTime": 1400, // Start time, well that's considerate since it can just be calculated from startUnit and endUnit
            "endTime": 1740, // Same as above
            "expLessonGroup": null, // lesson group information
            "expLessonGroupNo": null, // Same as above
            "roomIds": "1205", // Room IDs, used for i18n
            "rooms": "津南公教楼C区122" // Room name
        },
    // ...
    // Same thing repeating
    ],
    "expLessonGroups": [ // All lesson groups for the course
        {
            "id": 6480, // Lesson group ID, Used for course selection with Groups
            "indexNo": 1, // Lesson group index number
            "stdCount": 200,
            "getStdCountLimit": 200
        }
    ]
}
```

### Course Selection
**Key API for Course Selection**: `https://eamis.nankai.edu.cn/eams/stdElectCourse!batchOperator.action`

@param `profileId` : Profile ID

@data 

In [None]:
from collections import namedtuple


course = namedtuple("Course", ["id", "name", "expLessonGroup"])
course_example = course(id=598178, name="高级语言编程实训2", expLessonGroup=6444)

In [None]:
try:
    course_select_response = client.post(
        "https://eamis.nankai.edu.cn/eams/stdElectCourse!batchOperator.action",
        headers={
            "Referer": course_category_example["url"],
            "X-Requested-With": "XMLHttpRequest",
        },
        data={
            "optype": "true",
            "operator0": f"{course_example.id}:true:0",
            "lesson0": str(course_example.id),
            f"expLessonGroup_{course_example.id}": str(
                course_example.expLessonGroup
            ),  # "undefined" if no expLessonGroup
        },
        params={"profileId": course_category_example["id"]},
    )
    print_request_info(course_select_response)
except Exception:
    pass

In [None]:
try:
    course_returned_response = client.post(
        "https://eamis.nankai.edu.cn/eams/stdElectCourse!batchOperator.action",
        headers={
            "Referer": course_category_example["url"],
            "X-Requested-With": "XMLHttpRequest",
        },
        data={
            "optype": "false",
            "operator0": f"{course_example.id}:false:0",
            "lesson0": str(course_example.id),
        },
        params={"profileId": course_category_example["id"]},
    )
    print_request_info(course_returned_response)
except Exception:
    pass

```html
<body>
    <table width="100%" align="center">
        <tr style="padding-left:20%">
            <td style="text-align:center;">
                <div style="width:85%;color:red;text-align:left;margin:auto;">
                    操作 失败:当前选课不开放</br>
                </div>
            </td>
            <script type="text/javascript">
                if (window.electCourseTable) {
                }
            </script>
        </tr>
        <tr align="center">
            <td id="timeElapsed"></td>
        </tr>
    </table>

    <table width="100%" align="center">
        <tr style="padding-left:20%">
            <td style="text-align:center;">
                <div style="width:85%;color:red;text-align:left;margin:auto;">
                    中外文明交流与互鉴[0022]选课 失败:因课程代码不在培养方案中认定为计划外选课，且计划外名额已满，请核对培养方案或选择其他课程</br>
                </div>
            </td>
            <script type="text/javascript">
                if (window.electCourseTable) {
                    window.electCourseTable.lessons({ id: 598192 })
                        .update({
                            preElect: false,
                            defaultElected: false,
                            elected: false
                        });
                }
            </script>
        </tr>
        <tr align="center">
            <td id="timeElapsed"></td>
        </tr>
    </table>


    <table align="center" width="100%">
        <tr style="padding-left:20%">
            <td style="text-align:center;">
                <div style="width:85%;color:red;text-align:left;margin:auto;">
                    高级语言编程实训2[0057]选课 失败:你已经选过高级语言编程实训2
                </div>
            </td>
            <script type="text/javascript">
                if (window.electCourseTable) {
                    window.electCourseTable.lessons({ id: 598178 })
                        .update({
                            preElect: false,
                            defaultElected: false,
                            elected: false
                        });
                }
            </script>
        </tr>
        <tr align="center">
            <td id="timeElapsed">
            </td>
        </tr>
    </table>
</body>
```

Key API for Current Course Info: `https://eamis.nankai.edu.cn/eams/stdElectCourse!queryStdCount.action`

@param `projectId` : Project ID (Still unknown)

@param `semesterId` : Semester ID

@_ : Timestamp in milliseconds

Now there is an interesting behavior, I noticed a constant internet query to this api every 30 seconds. There is a parameter in this get request that has the current time stamp in UNIX. What's interesting is, even though the timestamp is in milliseconds, each 30 seconds passes the timestamp just adds 1, which is 1 millisecond per 30 second request. This is likely poor web design :-D

With this API, one can select course at a speed of 0.076 seconds per course with synchronous code and 0.046 seconds per course with asynchronous code.

**Update**: However with recent version of EAMIS API, frequent requests for both course information and election actions would trigger rate limiting. A waiting interval of 0.4 sec is necessary

In [None]:
current_time_stamp = int(time.time() * 1000)  # Current timestamp in milliseconds

course_selection_status = client.get(
    "https://eamis.nankai.edu.cn/eams/stdElectCourse!queryStdCourse.action",
    headers={
        "Referer": course_category_example["url"],
        "X-Requested-With": "XMLHttpRequest",
    },
    params={"projectId": 1, "semesterId": 4344, "_": current_time_stamp},
)
print(course_selection_status.content)

This API gives the current course election status in the format of this:
```js
window.lessonId2Counts={
    '598162': {sc: 0,lc: 0,upsc: 0,uplc: 0,plc: 0,puplc: 0} ...
    '598175': {sc: 153,lc:153,upsc:0,uplc:0,plc:0,puplc:0,
        expLessonGroups:{
            '6440':{indexNo:1,stdCount:150,stdCountLimit:154,proStdCountLimit:154},
            '6441':{indexNo:2,stdCount:7,stdCountLimit:0,proStdCountLimit:0}}
        }
        ...
}
```

This is not important since we don't care about the current course status.