[Bug][Jira] _raw_jira_api_epics table accumulates duplicate data across runs, causing extractEpics subtask performance degradation

### Search before asking

- [x] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues.


### What happened

When running data collection pipelines for Jira, we observed that the `_raw_jira_api_epics` table in the DevLake database continuously accumulates duplicate entries for the same Jira epics across multiple collection runs. Each subsequent successful collection run adds a new batch of raw data for epics, but the older, seemingly identical data for the same epics from previous runs is not removed or updated.

This unchecked growth of duplicate raw data in `_raw_jira_api_epics` is causing a significant performance issue. The extractEpics subtask, which presumably processes this raw data, takes increasingly longer to complete with each collection run due to the large volume of redundant data it has to handle.

### What do you expect to happen

We expect DevLake to manage the data in the `_raw_jira_api_epics` table in a way that prevents the indefinite accumulation of identical duplicate records across collection runs for the same source data.

Ideally, on subsequent collection runs for the same Jira connection and boards:

1. The system should avoid inserting data that is an exact duplicate of what is already present for a given epic.
2. Alternatively, old raw data for epics could be replaced or purged before or after inserting fresh data, ensuring the raw table doesn't grow indefinitely with duplicates.
Preventing this accumulation of duplicates in the raw table should resolve the observed performance degradation and reduce the execution time of the `extractEpics` subtask to a consistent level.

### How to reproduce

1. Set up an Apache DevLake instance.
2. Configure a data connection to a Jira instance that contains some epics.
3. Create and run a data collection pipeline using the configured Jira connection for one or more boards containing epics.
4. After the first run completes successfully, trigger and run the same DevLake collection pipeline for the same Jira connection and boards again.
5. Repeat step 4 multiple times (e.g., 2-3 more times).
6. Observe the execution time of the `extractEpics` subtask in the later runs compared to the first run; it should show a noticeable increase.
7. Inspect the contents of the `_raw_jira_api_epics` table in the DevLake database after multiple runs. You should find multiple rows with identical content (representing the same Jira epic, e.g., identified by the same URL), confirming the presence of duplicate data.

### Anything else

_No response_

### Version

main

### Are you willing to submit PR?

- [ ] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][Jira] _raw_jira_api_epics table accumulates duplicate data across runs, causing extractEpics subtask performance degradation #8409

Search before asking

What happened

What do you expect to happen

How to reproduce

Anything else

Version

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug][Jira] _raw_jira_api_epics table accumulates duplicate data across runs, causing extractEpics subtask performance degradation #8409

Description

Search before asking

What happened

What do you expect to happen

How to reproduce

Anything else

Version

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions