Can the installation of pandas/numpy be optimized/made optional #194

windrad6 · 2023-06-08T09:09:37Z

After trying to install FiLiP on a Raspberry PI I ran into issues with pandas/numpy. I checked the code and pandas is mainly used in examples, tests and tutorials. For the actual library, it is only used in models/ngsi_v2/units.py for load_units().

Can we make it optional?

It is one of the main drivers for time consumption during setup and a source of incompatibilities on different operating systems.

djs0109 · 2023-06-12T08:05:49Z

Hi, I see the pandas and numpy are also being used in timeseries.py. Do you have an idea how to change it in a clean way? Otherwise, a local branch or a fork would be the quick and dirty solution

github-actions · 2023-06-14T13:29:04Z

Branch 194-Can-the-installation-of-pandas/numpy-be-optimized/made-optional created!

Maghnie · 2023-06-22T12:13:33Z

@windrad6 Could you give a bit more info about the Raspberry PI OS and which commands exactly are you using to install filip?

tstorek · 2023-06-23T11:39:06Z

@windrad6 The TimeSeriesApi strongly relies on pandas as it is designed to parse the data to a format that users can directly trough into theier analysis algorithms. Making pandas optional would have two consequences:

TimeSeries Objects are not convertible to Pandas Dataframes directly. Furthermore, it most likely requires restructuring the code because the imports would be missing in the TimeSeries Models. An inline import in the pandas correlated functions could fix this.
The Code for unit validation would become far more complicated without pandas and require a lot of refactoring to convert it to a basic dict-based code. This should convert the Units to dicts that has either the "Name" or "Code" as keys(). The csv-files in the data folder may prohibit the automatic download via "load_datapackage". Hence, again this would require an inline import in the "load_datapackage" function.
I think that is already partially implemented in the get_keys() function ;)

The installation of numpy/pandas on raspberry pi is a known issue and it is a pitty that there is no real way to fix that without waiting for hours. Nevertheless, it is probably not the worst idea to reduce dependencies of the library if not required :)

Maghnie · 2023-07-24T11:09:34Z

@tstorek Would removing the dependency on pandas become less of a hassle if filip had a separate "lite" installation setup?

Most of the pandas usages are in the examples and tests, which the regular filip user doesn't need.

We would "just" have to figure out how to deal with the usages in Units.

tstorek · 2023-07-24T11:20:21Z

@Maghnie this is incorrect. Examples and tests are independent from the actual package installation. Removing pandas would partly break the timeseries models as already pointed out. Nevertheless, it is probably worth thinking of making unit validation not as strict as it is implemented right now or as an extension package. The current implementation may change with the migration to pydantic v2 (#199) anyway.

The decision on this is up to the core-developer-team. (@djs0109)

djs0109 · 2023-07-24T12:17:59Z

@Maghnie I find your idea of creating a "lite" version good. Do you already have an idea of what functionalities are not required? For example, Unit, TimeSeries API, and so on. I am not sure if a "lite" version can be made within the same repo, or should you make a fork? But as for now, I can only promise support in reviewing because our funded projects have not yet encountered such a requirement.

Maghnie · 2023-07-24T12:27:35Z

this is incorrect.

What is correct then is the practice of making controversial statements to insure a speedy reply, which is confirmed here. :)

Removing pandas would partly break the timeseries models as already pointed out.

I'm not closely familiar with the code but after skimming it, I'm not sure how removing pandas breaks the TimeSeries models, since its to_pandas method seems to be used only in tests.

Maghnie · 2023-07-24T12:30:29Z

Do you already have an idea of what functionalities are not required? For example, Unit, TimeSeries API, and so on.

Yes, I would start there too. We could later look into other bulky packages, if needed.

I am not sure if a "lite" version can be made within the same repo, or should you make a fork?

I imagine it can be done in the same repo, but just with different installation instructions.

But as for now, I can only promise support in reviewing because our funded projects have not yet encountered such a requirement.

Sure, I understand. Filip works in I-GReta too, it just takes a lot of time (aka money) for the installation.

tstorek · 2023-07-24T12:45:33Z

@Maghnie Let me clarify. to_pandas is a functionality provided by filip to the user. This would not be possible without pandas. Hence, removing pandas would break this functionality of the model. But as pointed it out this would need some rafactoring to make it optional.

Examples only use it for demonstration reasons and Tests insure that it works as expected. Usually, when using installation via pip the user does not even regognize their existence.

The Unit model is more complex and would require a little more refactoring because it uses pandas to manage all unit-information. Like a tiny database. Furthermore, the attribute models automatically enforce unit validation and auto-completion if some special keywords are used. Especially, on edge-device level this prohibts falsy unit usage. But apparently to the mentioned use-case it comes with some drawbacks that I was not aware of when I implemented that feature.

Nevertheless, personally, I agree with you that making both functionalities optinal is probably a good idea. But it seems easier as you might expect. That is what I tried to say. Nevertheless, splitting the library into several parts will most likely favor its usabilty in different scenarios.

BTW: I suggest to have a look into optional imports and extra dependencies before starting an additonal repo:

try:
   import pandas as pd
   has_pandas = True
ecxept ImportError():
  has_pandas = False


if has_pandas:
  do something with pandas....
else:
  raise ImportError()

Furthermore, you need to adjust the setup.py and add extras. Have a look here: https://github.com/RWTH-EBC/ebcpy/blob/master/setup.py

You can find further exmaples online.

Maghnie · 2024-07-10T12:24:08Z

@windrad6 As a workaround for the time-consuming setup, could you try installing Filip with uv?

uv seems to be more efficient than "normal pip" as an installer and package resolver.

windrad6 added the bug Something isn't working label Jun 8, 2023

Maghnie self-assigned this Jun 14, 2023

Maghnie added the question Further information is requested label Jun 22, 2023

djs0109 added the feature request Request a potential feature label Jul 24, 2023

djs0109 removed the bug Something isn't working label Jan 15, 2024

tstorek mentioned this issue May 8, 2024

Refactor unit models #280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the installation of pandas/numpy be optimized/made optional #194

Can the installation of pandas/numpy be optimized/made optional #194

windrad6 commented Jun 8, 2023

djs0109 commented Jun 12, 2023

github-actions bot commented Jun 14, 2023

Maghnie commented Jun 22, 2023

tstorek commented Jun 23, 2023 •

edited

Loading

Maghnie commented Jul 24, 2023

tstorek commented Jul 24, 2023

djs0109 commented Jul 24, 2023

Maghnie commented Jul 24, 2023

Maghnie commented Jul 24, 2023

tstorek commented Jul 24, 2023 •

edited

Loading

Maghnie commented Jul 10, 2024

Can the installation of pandas/numpy be optimized/made optional #194

Can the installation of pandas/numpy be optimized/made optional #194

Comments

windrad6 commented Jun 8, 2023

djs0109 commented Jun 12, 2023

github-actions bot commented Jun 14, 2023

Maghnie commented Jun 22, 2023

tstorek commented Jun 23, 2023 • edited Loading

Maghnie commented Jul 24, 2023

tstorek commented Jul 24, 2023

djs0109 commented Jul 24, 2023

Maghnie commented Jul 24, 2023

Maghnie commented Jul 24, 2023

tstorek commented Jul 24, 2023 • edited Loading

Maghnie commented Jul 10, 2024

tstorek commented Jun 23, 2023 •

edited

Loading

tstorek commented Jul 24, 2023 •

edited

Loading