-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitmaps? #14
Comments
@Oceania2018 Thanks for the info but there aren't a lot of 10k images that need processing. For instance, one I'm working now is 2531 x 2081 x 3 bpp, which makes for > 15.8 million bytes, in a 3 dimensional array. Obviously, anything big like that would be slower but I'm not sure if I want to find out how much slower. I may have to code it up just to see. |
@fdncred Hi, I havn't test that large dataset. There is definitely way to optimize it. Like use |
@Oceania2018 I experimented with Bitmaps and it just crashes visual studio when I try to inspect the np variable. I take that to mean that the arrays are so large that NumSharp can't handle it or I've constructed the numpy array incorrectly. I suspect I'm not using NumSharp correctly. Any ideas? My system is pretty beefy - 16GB Ram, 12-CPUs. This is what I did.
Note: If you uncomment the byteImage code, my code creates a 3 dimensional byte array. I use this in other places and works great. This code was meant for testing and only really handles 32-bpp and 24-bpp images. The intent of the code was to create a 2d array of image data by following the example of Array2Dim TestMethod. I know it's not right since np[0] only has two values where it should have 3. I'd like to create a 3d array like with byteImage but I'm not sure how to do that with NumSharp. private void BitmapToArray(string notes1a)
{
var bmp = (System.Drawing.Bitmap)System.Drawing.Image.FromFile(notes1a);
var bmpd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
var dataSize = bmpd.Stride * bmpd.Height;
byte[] data = new byte[dataSize];
Marshal.Copy(bmpd.Scan0, data, 0, data.Length);
bmp.UnlockBits(bmpd);
var includeAlpha = false;
var stride = bmpd.Stride;
//var byteImage = new byte[bmpd.Height][][];
var w = bmpd.Width;
var dataLen = data.Length / 4;
var np = new NumSharp.NDArray<List<int>>();
var list = new List<List<int>>();
for (int i = 0; i < dataLen; i++)
{
var x = i % w;
var y = i / w;
//if (x == 0)
// byteImage[y] = new byte[w][];
var o = (y * stride + x * 4);
if (includeAlpha)
{
//byteImage[y][x] = new byte[] { data[o], data[o + 3], data[o + 2], data[o + 1] };
list.Add(new List<int>() { data[o], data[o + 3], data[o + 2], data[o + 1] });
}
else // FYI - Data is in BGR layout
{
//byteImage[y][x] = new byte[] { data[o + 3], data[o + 2], data[o + 1] };
list.Add(new List<int>() { data[o + 3], data[o + 2], data[o + 1] });
}
}
np = np.Array(list);
} |
@fdncred hm I will check your code on Visual Studio Code, Windows, .NET Core 2.1 and maybe I try to use NDArray<double[]> .... somehow. I am not 100% sure if the List is best data type .... we use it in tests often because the lists (arrays) are small in tests. but it is possible to use double[]- so C# arrays instead. they have much better performance. |
@fdncred not sure if it is important. But what operating system you use? normal Windows? |
Windows 10 1809 Build 17763.55 64-bit |
This may be closer but still not right because the shape is wrong. private void BitmapToArray(string notes1a)
{
var bmp = (System.Drawing.Bitmap)System.Drawing.Image.FromFile(notes1a);
var bmpd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
var dataSize = bmpd.Stride * bmpd.Height;
byte[] data = new byte[dataSize];
Marshal.Copy(bmpd.Scan0, data, 0, data.Length);
bmp.UnlockBits(bmpd);
var includeAlpha = false;
var stride = bmpd.Stride;
//var byteImage = new byte[bmpd.Height][][];
var w = bmpd.Width;
var h = bmpd.Height;
var dataLen = data.Length / 4;
var arr = new NumSharp.NDArray<NumSharp.NDArray<NumSharp.NDArray<byte>>>();
arr.Data = new NumSharp.NDArray<NumSharp.NDArray<byte>>[h];
for (int i = 0; i < dataLen; i++)
{
var x = i % w;
var y = i / w;
if (x == 0)
{
//byteImage[y] = new byte[w][];
arr[y] = new NumSharp.NDArray<NumSharp.NDArray<byte>>();
arr[y].Data = new NumSharp.NDArray<byte>[w];
}
var o = (y * stride + x * 4);
if (includeAlpha)
{
//byteImage[y][x] = new byte[] { data[o], data[o + 3], data[o + 2], data[o + 1] };
arr[y][x] = new NumSharp.NDArray<byte>();
arr[y][x].Data = new byte[4];
arr[y][x].Data[0] = data[o];
arr[y][x].Data[1] = data[o + 3];
arr[y][x].Data[2] = data[o + 2];
arr[y][x].Data[3] = data[o + 1];
}
else // FYI - Data is in BGR layout
{
//byteImage[y][x] = new byte[] { data[o + 3], data[o + 2], data[o + 1] };
arr[y][x] = new NumSharp.NDArray<byte>();
arr[y][x].Data = new byte[3];
arr[y][x].Data[0] = data[o + 3];
arr[y][x].Data[1] = data[o + 2];
arr[y][x].Data[2] = data[o + 1];
}
}
} I'm trying to match the array from this python code. Which is shape(2531, 2081, 3). pil_img = Image.open(filename)
img = np.array(pil_img) |
Understand. When at home maybe will try to extend array method for this. By the way. Thanks for show us the python code. It is important that we really match the APIs as well as possible. |
@fdncred u use the code from github and builit or u take the nuget package? Just to know how to support ur case best |
I downloaded and compiled the code from Github. That's what I meant when I said this above.
|
Ah yes. Lol sorry my fault. Ok will test it at home. |
No worries, thanks for your help. |
@dotChris90 Do you think we should refactor NDArray class to every specific generic type? separate NDArray to NDArray<double[]> or NDArray where T is limited to value type, and change
to
For 3 Dim will be
I thought this will definitly get the best performance. |
@Oceania2018 yes maybe we should consider some restructure. Performance Generic aspect var A = new NDArray<double[][]>().Array( .... ); var c = A.Dot(b); It is quite clean since you see "ok A is array of array --> so a matrix" and "ok b is array". So a NDArray will look like this public partial class NDArray where TData : IList |
@Oceania2018 what you think? Honestly I do not want to start sth like "NDArray2" or "NDArray3" because it is not numpy API ;) |
An alternate approach is to compile the numpy source code into a c++ dll and then p/invoke calls out of it. This is kind of what python does. Numpy isn't written in python, just the wrapper is. Then you'd have all the speed of numpy and one would have to figure out how to marshal data back and forth. Update |
But I did find this. Looks like it could be helpful. |
@dotChris90 I like the jagged array. var A = new NDArray<double[][]>() |
@Oceania2018 ok - if you do not mind I would do a totally restructure at Friday and weekend (have some holidays). I suggest just one of us (so me) do this because it also include changing the unit Tests etc. |
I have another idea. What about create new class named NumSharp, it will be equivalent np when you do bar np = new NumSharp(). then np.arange(10). NumSharp will act like a router. |
NumSharp will hide the mass of NDArray. I agree with you. You will do the restructuring. Appreciate. |
@fdncred interesting. Seems NumSharp is not the only project try to reconstruct numpy lol Thanks for post. I just think that in .NET Core 3.X the .NET system will bring a lot more stuff for machine learning, array performance and so on. That is the reason I avoid using C. AT the moment. But if we find out in 2019 that .NET Core 3.0 does not bring us what we wish we will go with C maybe. And about the Numpy project. I think at moment they use their internal mechanism by including the Python.h in their files. If we want to integrate this into .NET it feels a little bit too much wrapping and we still have to implement the classes. I really would like to see if the numpy team would writing their stuff in C and compile to shared object and linking their python object code to this shared object. Anyway maybe we can have a look on their Github repo :D |
@Oceania2018 honestly the np = new NumSharp(). is a fantastic idea. lol this makes all stuff look more like numpy. We could try to use .NET script or Powershell and make some examples. after restructure the array stuff. |
@dotChris90 Sounds great, let's do it. I will add a NumSharp class, you do the NDArray restructing.
@fdncred Are you interested joining this project? Created a new issue #34 |
@dotChris90 I posted that C++ link in order to help port to C#. For me, at least, it's easier to read C/C++ and turn it into C# than it is for me to ready python and turn it into C#. Here's another C++ port of the NumPy functionality with help. Again, may just be useful seeing how other people reinterpret numpy. |
@fdncred I made Testmethod for your case to try and play around with this use case of byte[][][]. In Visual Studio Code the debugger for this image working fine - slow but fine. but not for our NDArray - i just tried at moment for byte[][][]. I would suggest I will do the restructure of our NDArray this week and extend Array method. I will let you know when finish. Honestly until now we did not think about Tensor types like byte[][][]. Maybe that was the reason the Shaping method does not work proper. When finish the restructure will let you know. And You can try than sth like var myArray = new NDArray<byte[][][]>().Array(new Bitmap("pathToImage")); For now - if you want to play around with the code now - Maybe you could try to make : var myArray = NDArray<byte[][]>(); // so a NDArray of byte array of byte array - but it looks like matrix thats why want to restructure |
@fdncred sure. Honestly the link was interesting. and totally agree with you. C++ and C# are much closer to each others than Python. Even python is a nice language but ... lot of things are missing. Generics - just as example. Maybe will have a look deeper in this C++ projects |
@fdncred just question of curiosity. What API you suggest to implement for byte[][][]? In other words - what would be good to see for images? |
@fdncred @Oceania2018 I checked your link https://xtensor.readthedocs.io/en/latest/numpy.html amazing! but I asked myself - It is not possible to have an array<double,2> generic - am I right? Because this looks extreme nice for users. But I never saw this in C# or in general .NET world. |
@dotChris90 I just disucssed with someone else. We have an other solution. Please hold on. Don't do any change. |
@Oceania2018 ok will do nothing for today. But what was discussing about the NDArray<double[][]> , the Bitmap or np = new Numsharp? :D |
I pushed code. Please refer
|
@dotChris90 I don't understand your question about what API for byte[][][]. Sorry. Having an image in a byte[][] or byte[][][] is only useful as it relates to numpy's algorithms. If you look at this python project you'll see how they're using it. This python project is where I got the idea to use NumSharp when I converted it to C#. |
@fdncred I think we would do it like this:
|
@Oceania2018 That seems intuitive to me as long as it returns np[height][width][byte[3]]. I think that's what python is doing but maybe it's returning tuples - I can never tell with imaging on python. |
@fdncred the project page is enough. Before usual always working with time series. not too much with images. :D So I dont know well which functions are used mostly. Just was looking for some inspiration or use cases. |
@fdncred I created a 1M bytes, cost 38ms. @Oceania2018 |
@Oceania2018 Do I really understand you well? So you want store everything (1D,2D,3D,...ND) in a single array? The properties like Shape decide the dimension? Do not get me wrong but this will leads to some ... problems I think. 1 ) our methods will get longer and not so well structured. Until now we can have "MethodName(NDArray< double >)" and "MethodName(NDArray<double[]>" to differ between vector and matrix. Since polymorphism we can have 2 different methods with the same name but different parameters. If our objects are always NDArray you can not make this but instead always have to do a huge "if else" structure. If method see it is vector do this, if matrix this. So this leads to less files but also increase the danger of "people have to work on 1 file at the same time". 2 ) It is not totally OOP in my opinion. In OOP we say "This is a matrix and it has this methods and properties" and this is a vector with properties and methods. But here we say it is an array - It can be anything. That is dynamic interpreter style not compiler. It is python - not C#. 3 ) Performance. Sorry I say but I am not sure if a huge array brings better performance. We should do some tests to find out best but I am very sure jagged arrays are faster than 1 huge array and you have to search the elements first with every access. 4 ) Do you really, really want to rewrite all the operations and methods? It will be hard because on Stackoverflow you will find code examples with double[][] and so on but never a example with double[] for a matrix. 5 ) Why you want to create your own Array? .NET world already has very fine and optimised ways for arrays. Python not so they developed from scratch. So we should always stay with this array type system. So please give me some reason why NDArray< double > matrix = new NDArray< double >().Array(...) is better than NDArray<double[][]> matrix = new NDArray<double[][]>().Array() ? I know the QuantStack do it and I find little bit weird since I CAN NOT SEE FROM MY CODE WHAT Numeric TYPE I HAVE. Have a look again : var matrix = new NDArray<double[][]>() // I can see 100% it must be a matrix So give me some arguments and pros. I do not want something like "because QuantStack do like this". I want sth like "better performance for Matrix Multiplication" because honestly all the points I listed at moment makes me feel not comfortable with the QuantStack solution. |
@Oceania2018 will open an other issue to discuss this. Bitmap it not the best name for this ;) |
@fdncred I pushed an array method to NumSharp which accepts a bitmap object as input parameter. You can try and play. :) With the new 1D array strategy we could simple take the byte array of this Marshal. Copy method and put into NDArray Data property. Just need to set the shape as height wideth 3. Only thing I don't understand is that the order of rgb vector is different now in numpy and Marshall. Copy. Shall we correct this? |
@fdncred took your image and example code for the method and unit test. Hope u don't mind :) |
@dotChris90 I'll take a look at it. I have no problem with you using any of the code I've pushed or put in issues, so feel free to use it without question. The thing about dotnet bitmaps is they're stored in BGR format. So that may be why the vector is different. So, typically there's a byte swap of R & B to get them aligned properly. I see some things I'd change but this is definitely a good start. We just have to figure how what BPP we will support and be able to handle those flavors of bitmap. For speed purposes we could also use unsafe calls on bmp.Scan0 instead of marshaling. Marshaling isn't exactly fast, but we can decide that later. |
@fdncred totally agree. First let make a nice start for NumSharp. :) |
I'm interested in loading a bitmap in to a NumSharp array. I realize that isn't written yet but I'm concerned that if I write and contribute that method it will be way too slow to do anything with. What are your thoughts on speed?
Thanks,
Darren
EDIT:
System.Drawing.Bitmap are now supported by a separate package, read more.
The text was updated successfully, but these errors were encountered: